HKU-BAL / ClairS

ClairS - a deep-learning method for long-read somatic small variant calling
BSD 3-Clause "New" or "Revised" License
67 stars 7 forks source link

Haplotype filtering step keep stuck #8

Closed tahashmi closed 1 year ago

tahashmi commented 1 year ago

Hi, ClairS keep stuck at this command after completing phasing for hours on ONT data. No output in 4_HAP_FILTER.log log file either. I am running second time, and seeing same issue.

[INFO] STEP 4: Haplotype filtering
[INFO] RUN THE FOLLOWING COMMAND:
( pypy3 /opt/bin/clairs.py haplotype_filtering --tumor_bam_fn /data/tahmad/devel/ClairS_HCC1437/tmp/clair3_output/phased_output/tumor_ --ref_fn /data/tahmad/alignment/grch38_chr.fasta --germline_vcf_fn /data/tahmad/devel/ClairS_HCC1437/tmp/clair3_output/clair3_tumor_output/merge_output.vcf.gz --pileup_vcf_fn /data/tahmad/devel/ClairS_HCC1437/tmp/vcf_output/pileup.vcf --full_alignment_vcf_fn /data/tahmad/devel/ClairS_HCC1437/tmp/vcf_output/full_alignment.vcf --output_dir /data/tahmad/devel/ClairS_HCC1437/tmp/vcf_output --samtools samtools --threads 56 ) 2>&1 | tee /data/tahmad/devel/ClairS_HCC1437/logs/4_HAP_FILTER.log

My clairS run command:

singularity run -B /data/tahmad /data/tahmad/images/clairs_latest.sif /opt/bin/run_clairs --threads 56 --phase_tumor True --longphase True --tumor_bam_fn /data/tahmad/devel/bams_haplotagged/HCC1437_ONT.bam --normal_bam_fn /data/tahmad/devel/bams_haplotagged/HCC1437BL_ONT.bam --ref /data/tahmad/alignment/grch38_chr.fasta --output_dir /data/tahmad/devel/ClairS_HCC1437 --platform ont_r10

I see following BAMs and phased VCFs:

ls -lh /data/tahmad/devel/ClairS_HCC1437/tmp/clair3_output/phased_output/tumor_
tumor_chr10.bam                tumor_chr17.bam.bai            tumor_chr3.bam                 tumor_phased_chr10.vcf.gz.tbi  tumor_phased_chr18.vcf.gz      tumor_phased_chr3.vcf.gz.tbi
tumor_chr10.bam.bai            tumor_chr18.bam                tumor_chr3.bam.bai             tumor_phased_chr11.vcf.gz      tumor_phased_chr18.vcf.gz.tbi  tumor_phased_chr4.vcf.gz
tumor_chr11.bam                tumor_chr18.bam.bai            tumor_chr4.bam                 tumor_phased_chr11.vcf.gz.tbi  tumor_phased_chr19.vcf.gz      tumor_phased_chr4.vcf.gz.tbi
tumor_chr11.bam.bai            tumor_chr19.bam                tumor_chr4.bam.bai             tumor_phased_chr12.vcf.gz      tumor_phased_chr19.vcf.gz.tbi  tumor_phased_chr5.vcf.gz
tumor_chr12.bam                tumor_chr19.bam.bai            tumor_chr5.bam                 tumor_phased_chr12.vcf.gz.tbi  tumor_phased_chr1.vcf.gz       tumor_phased_chr5.vcf.gz.tbi
tumor_chr12.bam.bai            tumor_chr1.bam                 tumor_chr5.bam.bai             tumor_phased_chr13.vcf.gz      tumor_phased_chr1.vcf.gz.tbi   tumor_phased_chr6.vcf.gz
tumor_chr13.bam                tumor_chr1.bam.bai             tumor_chr6.bam                 tumor_phased_chr13.vcf.gz.tbi  tumor_phased_chr20.vcf.gz      tumor_phased_chr6.vcf.gz.tbi
tumor_chr13.bam.bai            tumor_chr20.bam                tumor_chr6.bam.bai             tumor_phased_chr14.vcf.gz      tumor_phased_chr20.vcf.gz.tbi  tumor_phased_chr7.vcf.gz
tumor_chr14.bam                tumor_chr20.bam.bai            tumor_chr7.bam                 tumor_phased_chr14.vcf.gz.tbi  tumor_phased_chr21.vcf.gz      tumor_phased_chr7.vcf.gz.tbi
tumor_chr14.bam.bai            tumor_chr21.bam                tumor_chr7.bam.bai             tumor_phased_chr15.vcf.gz      tumor_phased_chr21.vcf.gz.tbi  tumor_phased_chr8.vcf.gz
tumor_chr15.bam                tumor_chr21.bam.bai            tumor_chr8.bam                 tumor_phased_chr15.vcf.gz.tbi  tumor_phased_chr22.vcf.gz      tumor_phased_chr8.vcf.gz.tbi
tumor_chr15.bam.bai            tumor_chr22.bam                tumor_chr8.bam.bai             tumor_phased_chr16.vcf.gz      tumor_phased_chr22.vcf.gz.tbi  tumor_phased_chr9.vcf.gz
tumor_chr16.bam                tumor_chr22.bam.bai            tumor_chr9.bam                 tumor_phased_chr16.vcf.gz.tbi  tumor_phased_chr2.vcf.gz       tumor_phased_chr9.vcf.gz.tbi
tumor_chr16.bam.bai            tumor_chr2.bam                 tumor_chr9.bam.bai             tumor_phased_chr17.vcf.gz      tumor_phased_chr2.vcf.gz.tbi   
tumor_chr17.bam                tumor_chr2.bam.bai             tumor_phased_chr10.vcf.gz      tumor_phased_chr17.vcf.gz.tbi  tumor_phased_chr3.vcf.gz  
zhengzhenxian commented 1 year ago

Hi,

Based on the information provided, the issue is possibly due to a zombie process caused by multi-processing in the haplotype filtering submodule.

We tried to rewrite the function and test locally, please try to re-pull the docker image (You might need to remove your local image first using dcoker rmi hkubal/clairs:latest ). If the issue persists, pls kindly let us know, thanks!

Zhenxian

tahashmi commented 1 year ago

Thanks! I will try and let you know.

tahashmi commented 1 year ago

This fix works. Thanks. With --phase_tumor True, in which directory should I expect single (combined) phased tumor VCF and single haplotagged BAM?

zhengzhenxian commented 1 year ago

Hi,

The phasable somatic variants are tagged with H in the INFO column already. You could find the haplotagged BAM in ${OUTPUT_DIR}/tmp/clair3_output/phased_output/ directory, but pls note that the BAM were haplotagged using the called germline variants in tumor and normal sample.