create options to **not** require FASTQ input

Addressed in version 3.14.0 as follows:

Re-running the pipeline without the FASTQ files

The pipeline can perform the entire analysis starting with the FASTQ files holding the results of the PacBio sequencing (used to build the barcode-variant lookup table) and the Illumina sequencing (used to count the barcodes) to the final processed results. However, these FASTQ files are typically very large and so are not tracked in the GitHub repo but need to be stored somewhere else like on a computing cluster, either at the locations they were originally generated or (for secondary use) where they were downloaded from the NCB Sequence Read Archive (SRA). The locations of those files are specifed in the pacbio_runs and barcode_runs CSV files indicated in the config.yaml, and will be specific to the configuration of the computing cluster for which the pipeline is being run since the files are too large to store within a GitHub repo.

However, for many re-use purposes, secondary users do not really need to re-process the FASTQ files are the barcode counting and barcode-variant lookup table construction are fairly simple, and secondary users may just be happy to use the lookup table and counts generated from prior processing of the FASTQ files. If you are using a repo where these counts and the barcode-variant lookup table are already computed and stored in the repo, you can then just start with those and avoid having to handle the FASTQ files at all. To do that, you set the following options in the configuration YAML (config.yaml) as follows:

prebuilt_variants: results/variants/codon_variants.csv  # use codon-variant table already in repo
prebuilt_geneseq: results/gene_sequence/codon.fasta  # use gene sequence already in repo
...
use_precomputed_barcode_counts: false  # use barcode counts already in repo

Then running the repo will no longer require any FASTQ files, and will juse utilize the precomputed variants and counts from those files stored in the repo.

dms-vep / dms-vep-pipeline-3