BD2KGenomics / toil-rnaseq

UC Santa Cruz Computational Genomics Lab's Toil-based RNA-seq pipeline
Apache License 2.0
38 stars 10 forks source link

Add ability to handle mismatched fastq files #151

Closed jvivian closed 5 years ago

jvivian commented 6 years ago

The largest source of failure in the workflow stems from CutAdapt's stringent requirement that fastq files have no mismatched reads.

Example Error

cutadapt: error: Reads are improperly paired. Read name 'NS500257:75:HLN25BGX5:1:11101:10293:1043 1:N:0:2' in file 1 does not match 'NS500257:76:HTH2NBGX5:2:11101:24171:1047 2:N:0:1' in file 2.

It would be nice to allow the user an option that would fix this issue within the workflow so the user isn't responsible for coming up with their own preprocessing fix prior to running the workflow.

One possible solution, although I'll need to make an appropriate Docker container for it.

jvivian commented 5 years ago

Fastq pairing is both resource and time intensive, so we'll keep the onus of pairing on the person who prepares the inputs.