HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
246 stars 27 forks source link

Very slow processing of low-quality chrX variants with full-alignment model #332

Closed esraaelmligy closed 2 months ago

esraaelmligy commented 3 months ago

Greetings, I have been using the Clair3 docker image for months, and i never ran into the issue of the models being stuck or performing too slow while processing a chunk, but this week while trying to rerun a dataset, the full alignment model seemed to be very slow when it starts on low quality chromosomes, it ran for 4 days and seemed to be nowhere near finishing, each chunk took about more than 90 minutes to finish.

I terminated the job because something seemed abnormal, and when i re-ran it, the slow processing started only at chromosome X chunks.

I do not really know what seems to be causing this strange performance, it is to note that i purged all of my unused docker containers at the beginning of the week because they were taking about half of my space, could that be the issue? Or is it something do with the model or sample itself?

Commnad: docker run -v $(pwd):$(pwd) -w $(pwd) hkubal/clair3:latest /opt/bin/run_clair3.sh --bam_fn mapped.bam --ref_fn GRCh38_full_analysis_set_plus_decoy_hla.fa --threads 16 --model_path /opt/models/ont --platform ont --output Clair3/

zhengzhenxian commented 3 months ago

Hi @esraaelmligy,

We might need more details on the runtime issue. Could you share the logs ${OUPUT_DIR}/run_clair3.log and ${OUPUT_DIR}/log or send them to my email zxzheng@cs.hku.hk ? Thanks.

esraaelmligy commented 3 months ago

Yes sure, here is the link to all of the log files: Drive Link

zhengzhenxian commented 3 months ago

@esraaelmligy

Thanks for the logs.

It seems there is some stack overflow issue that blocks the process from completing. We have some fixes in our v1.0.6 and v1.0.7 releases. Could you delete your local image and re-pull our latest docker image(v1.0.10)? Pls let us know if the issue persists, thanks.