DecodeGenetics / graphtyper

Population-scale genotyping using pangenome graphs
http://dx.doi.org/10.1038/ng.3964
MIT License
167 stars 20 forks source link

Binary of graphtyper runs extremely slow #88

Closed shangshanzhizhe closed 3 years ago

shangshanzhizhe commented 3 years ago

Hi,

Thank you for the useful tool. I failed when compiling the graphtyper locally, so I downloaded the binary file of it. But when I tried to genotype SVs across about 50 samples with multithreads, it seems that it was not using ideal cpus and memories as expected. The process was very slow. Do you have any suggestions on this?

All the best, Shangzhe

hannespetur commented 3 years ago

Hello, have you seen this section in our user guide: https://github.com/DecodeGenetics/graphtyper/wiki/User-guide#subsampling-reads-in-abnormally-high-sequence-depth

Maybe graphtyper is very slow in some regions where the sequence depth is extremely high (10,000x+), but we provide a way to subsample the reads from such regions to make graphtyper run much faster.

Best, Hannes

shangshanzhizhe commented 3 years ago

Thanks for your reply. I'm afraid I didn't describe it clearly. Overall the graphtyper works well, but it doesn't occupy threads as expected. For example, my command was like: graphtyper genotype_sv --threads=32 /data/01/user186/01.fenshu.sv.illumina/00.reference/Fsr_decontaminated_genome.fasta Fenshu.merge.all.vcf.gz --sams=/data/01/user186/01.fenshu.sv.illumina/all.bam.files --region=Contig1:82200001-82300000 --avg_cov_by_readlen=/data/01/user186/01.fenshu.sv.illumina/All.bam.avg_cov_by_readlen --output=05.genotyper/.

Here is the top stat when it's running: image

Sometimes (fairly rare) it could use 32 threads, but mostly I believe it runs quite slowly like above. Any suggestions please?

hannespetur commented 3 years ago

Hey.

Alright, so you have 4 processes asking for 32 threads each, that's 128 threads in total. Do you have that many threads available and allocated to you on the machine you are running graphtyper?

If so, an idea that can partly explain poor thread utilization is that graphtyper allocates each file to one thread so in your case (with 50 input files) 18 threads will get allocated 2 files and 14 threads 1 file. Best case scenario, all files take the same amount of time, then in 50% of the runtime you have 14 idle threads. If there is one file which takes much longer than the others then it may get much worse than that, possibly leading to 31 idle threads for some time while a single thread works on the high coverage file. There is no concrete solution except to ask for fewer threads to get better utilization at the cost of wall clock time.

Another idea is that the program is limited by network I/O if you are reading the data through a network. In my experience when graphtyper is a lot in the "D" (uninterruptable) state then I am trying to read BAMs/CRAMs much faster than the network can handle. Only solution is to reduce the number of network I/O intensive processes.

Best, Hannes

shangshanzhizhe commented 3 years ago

Hi, Thanks for your remind. My hard disk was mapped to the nodes by net and the I/O was crowded. I moved the data files to a physically mounted disk and graphtyper worked greatly. Thanks for the awesome tool again!

Shangzhe