As is the workflow takes fastq files and checks for contamination, adapters etc. This workflow is expected to work with both short and long reads. However the direct deliverable from sequencing cores for Illumina short-read data is (almost always) the raw fastqs, but this sometimes isn't the case with long-read data:
For PacBio sometimes the immediate deliverable isn't fastq, I've seen BAM in the past, sometimes the FASTA of the consensus sequence depending on what the sequencing core does
The last time I did a Nanopore run for metagenomes you usually have 100s of tiny fastq/fast5 files from basecalling and after removing adapters with a tool like porechop then it's a single fastq.
For this workflow, the questions are:
If the immediate deliverable for long-reads for a project isn't fastq, are we expecting users to do something outside of the workflow before doing QC
If not, what upstream processes and checks do we add so long-reads can go through basically the same checks as short reads
First brought up in this PR https://github.com/Arcadia-Science/seqqc/pull/8#discussion_r1013540987
As is the workflow takes fastq files and checks for contamination, adapters etc. This workflow is expected to work with both short and long reads. However the direct deliverable from sequencing cores for Illumina short-read data is (almost always) the raw fastqs, but this sometimes isn't the case with long-read data:
For this workflow, the questions are: