bio-raum / FooDMe2

A nextflow pipeline for the identification of species from mixed samples based on mitochondrial amplicons
https://bio-raum.github.io/FooDMe2/
GNU General Public License v3.0
1 stars 1 forks source link

💡 [REQUEST] - Non overlapping reads support #77

Open gregdenay opened 4 days ago

gregdenay commented 4 days ago

Discussed in https://github.com/bio-raum/FooDMe2/discussions/76

Originally posted by **gregdenay** November 22, 2024 Hi @marchoeppner some colleagues that are actively working on method developement have barcodes that are a bit to long, meaning read merging is not possible. This is in particular true for the published fish barcodes and some of the plants, already while using the 2x250bp chemistry. Their strategy for now is to run the analysis independently for R1 and R2 and try to manually merge the end results by hand. I would assume that one could simply merge R1 and R2 after denoising or prior to identity clustering (eventually with a padding sequence in the middle). Since BLAST is a local aligner, opening gaps is not a problem and the successive steps should actually work fine, mybe with some adapted parameters in the blast search. What do you think? Implementation-wise it should actually be straigthforward to do it as an experimental feature. VSearch supports this with the `--fastq_join` parameter (instead of `--fastq_merge`) and DADA's `mergePairs` has a `justConcatenate=TRUE` argument. Both insert an `NNN` padding sequence (length 8 for VSearch and 10 for DADA2).
gregdenay commented 2 days ago

Implemented the --non_overlapping parameter and the logic to switch between merging and concatenating reads in the VSEARCH:FASTQMERGE and DADA:DENOISING modules. It is pretty much untested beyond checking for obvious error. I am waiting for appropriate data to run proper tests.

59a710b