aitgon / vtam

MIT License
3 stars 3 forks source link

sortreads: improve demultiplexing by searching for all tag combination in parallel #16

Closed meglecz closed 2 years ago

meglecz commented 3 years ago

Use following real demultiplexing instead of going through the same file separately for each tag combination. In cutadapt.v3, this can be done on multiple threads.

One cutadapt command for each input fasta file.

cutadapt --cores=0 -e 0 --no-indels --trimmed-only -g file:barcodes.fasta -o "tagtrimmed.{name}.fasta.gz" merged_file.fasta.gz

barcode.fasta file is the following format by default (anchored search)

>marker-run-sample-replicate
^tcgatcacgatgt...gctgtagatcgaca$

Add pigz to the conda environment and singularity recepie file so the output file can be zipped in multithreading mode. Otherwise there will be a bottleneck at zipping output files.