bluenote-1577 / flopp

flopp is a software package for single individual haplotype phasing of polyploid organisms from long read sequencing.
33 stars 7 forks source link

Problem with multithreading #16

Closed ZoeVance closed 10 months ago

ZoeVance commented 10 months ago

Hello!

I am having some issues with running flopp on multiple threads. When running on the default 10 threads, 10 processes do seem to be spawned but only one appears to be running (from inspection with htop). This seems to be confirmed by the fact that running on a single thread for the same data does not show any appreciable difference in runtime (25m46.324s for default, 27m18.906s for -t 1).

I'd really like to resolve this as flopp is otherwise performing amazingly well on my test data, but I don't think running single threaded on whole datasets will be workable. Unfortunately no experience in rust so not really able to do much troubleshooting myself.

Running the latest release on Ubuntu 20.04.6 with rustc v 1.66.1 if that's of any help.

Thanks for your time! Zoe

bluenote-1577 commented 10 months ago

Hi @ZoeVance,

Thanks for the report.

Do you have a log available? That would help me see what the issue is. So running flopp .... > log.txt.

flopp only does multithreading in the phasing stage. often times the main bottleneck is reading inputs, for which flopp is only single threaded. Or if you genome is very small, it will only be single threaded, because it does multithreaded phasing by concurrently phasing along the genome, but only if it is phasing a large enough region.

If you could send me your data sets that would be helpful too.

Thanks,

Jim

ZoeVance commented 10 months ago

Hi Jim,

Ah, I don't think any need for more investigation on your end. I'm running on a single, fairly small, region so naturally it's running on a single thread. Any long runtimes are probably down to high sequencing depth, I'll try some testing with downsampling.

Thanks for the help, Zoe