COMBINE-lab / pufferfish

An efficient index for the colored, compacted, de Bruijn graph
GNU General Public License v3.0
107 stars 19 forks source link

Support interleaved paired end FASTQs #19

Open nh13 opened 4 years ago

nh13 commented 4 years ago

In many cases, I get a single FASTQ with pared end reads "interleaved", namely reads for a pair are consecutive, have the same name except for a trailing /1 or /2 to identify the end of the pair. I see --mate1 and --mate2 for paired-end reads, and --read for single end reads.

fataltes commented 4 years ago

Dear Nils (@nh13),

Thank you for your quick test of PuffAligner.

We appreciate all your useful feedback about making PuffAligner easier to use.

Support for interleaved FASTQ files seems as it would be a useful option. One question (for us as developers) is if it is better to special-case the parser for this format, or to take care of it externally in how the program is invoked. For example, a script like this one would make it possible to provide interleaved FASTQ files to Puffaligner and have it treat them as separate paired-end files by splitting the interleaved FASTQ file into two input FIFOs. The downside to this approach is that access to the executable in this case is mediated by one more level of indirection. On the plus side, something like this doesn't complicate the code and would work almost immediately. Perhaps we can provide a script like this for Puffalinger and you can let us know if it is adequate for your purposes.

Thanks!