Arcadia-Science / seqqc

A Nextflow pipeline to identify quality control issues with new sequencing data.
MIT License
28 stars 0 forks source link

Setting the pipeline up to work with long read data #13

Closed taylorreiter closed 2 years ago

taylorreiter commented 2 years ago

Over on #8, @elizabethmcd said:

Some questions for long-read implementations:

  1. From what I remember, the immediate deliverable for PacBio isn't fastq sometimes. Are we adding upstream steps for converting between file formats to get to fastq or depending on somebody else to do that prior to the QC workflow?
  2. The last time I did Nanopore sequencing, you have a bunch of small fastq/fast5 files from basecalling and when you do the adapter removal with a tool like porechop then it's a single fastq. Are we assuming that those steps are being done prior to Nanopore reads going through this workflow?

I have been envisioning that the input to this pipeline would be fastq files, but we'll need to discuss this more. The pipeline can also always evolve based on Arcadia's needs/ecosystem of workflows we end up with.

taylorreiter commented 2 years ago

woops. I didn't see there was already an issue for this 🤦‍♀️. Closing this now! See #11