DavidsonGroup / flexiplex

The Flexible Demultiplexer
https://davidsongroup.github.io/flexiplex/
MIT License
23 stars 2 forks source link

Multi-thread option #1

Closed mcortes-lopez closed 10 months ago

mcortes-lopez commented 1 year ago

Hi, I am testing demultiplexing based in a list of barcodes in ONT data but it has been running for more than 4 days (using 5 cores per task and 50G of memory) and it does not look like it has written out 10% of the estimated barcodes. Is there any multithread option or recommended memory specifications to run it in SLURM systems?

nadiadavidson commented 1 year ago

Hello,

Thank you for trying out our tool and posting about your issue. We don't current have an option for multiple threads, but we are still actively developing flexiplex so appreciate the feedback and will look into speeding it up. To help us think about the best way to do this, would you mind letting us know how many reads you are processing, how long your known barcode list is (if you use one), which flexiplex command you're running and any other details that might be relevant.

Many thanks, Nadia.

mcortes-lopez commented 1 year ago

Hi, Thanks for your reply! I have a file with +90 million reads, it contains 3 samples, each with around 10,000 barcodes (from the same library in short reads, Illumina 10x) For each barcode list I run flexiplex as: flexiplex -k ${BARCODE_LIST} -r false -n ${SAMPLE}_demux_bc ${main_fastq} > ${OUTDIR}/${SAMPLE}_demux_bc.fastq I ran it and in 4 days only 15 million reads have been processed. Would you suggest to split either the fastq or the barcode list to reduce the time? Best, Mariela

nadiadavidson commented 1 year ago

Hi Mariela,

Sorry for the slow reply. I've just update the code to run faster. For older noising reads we get about a 30% improvement, but for newer reads this should be even better (up to 90%). Unfortunately I haven't had a chance to make the code muti-threaded, but please leave this issue open as I would like to implement that at some stage. In the meant time splitting the fastq files as you suggested is the best idea. I've added this and a couple of other suggestions to the end of the documentation, https://davidsongroup.github.io/flexiplex/. You are welcome to keep us updated about what did/didn't work.

Cheers, Nadia.

nadiadavidson commented 10 months ago

The code is now multi-threaded, so closing this issue.