HKU-BAL / Clair3

Clair3 - Symphonizing pileup and full-alignment for high-performance long-read variant calling
233 stars 27 forks source link

Large difference in results between running in conda environment and running in Singularity/Docker container. #236

Closed Samvkes closed 10 months ago

Samvkes commented 10 months ago

I'm using clair3 (Ive tried 1.0.0 and 1.0.4) to do some structural variant calling. Currently I'm just doing some testing on the 37 variant of this GIAB-genome: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/UCSC_Ultralong_OxfordNanopore_Promethion/

I'm noticing that the results are much better when running clair3 inside my conda environment compared to when I run it in a docker or singularity container. I'm using the containers as part of my WDL-pipeline, but I've also tested the singularity container on its own, and get similar (bad) results. This divergence in results allready happens in the first step (the pileup phase): much fewer variants are found. These are my parameters:

    run_clair3.sh \
        --bam_fn=~{bamIn} \
        --ref_fn=~{refFastaIn} \
        --platform='ont' \
        --enable_long_indel \
        --fast_mode \
        --model_path="/opt/models/ont" \
        --threads=~{threads} \
        --output=output/ 

    mv output/merge_output.vcf.gz ./clair3_out.vcf.gz

I've tried the docker image you guys provide. The singularity container has an extended base image. This did not improve the result. It can be downloaded from here: https://github.com/bioconda/bioconda-recipes/pull/43786#issuecomment-1772725050

I've attached 2 logfiles from the the runs inside the conda environment and inside the container: container-run_clair3.log conda-run_clair3.log (the warning about the index being older can be ignored, I've used the index GIAB provides. The reason the warning doesn't show up in the conda run is that I've done a touch command with the index in between the runs.)

zhengzhenxian commented 10 months ago

Hi @Samvkes,

Thank you for providing the logs. Clair3 ran successfully in the conda environment, but it encountered issues when running in the singularity container. It appears that most contigs in the container produced empty pileup results, as indicated below. It is uncertain whether this issue is related to the outdated index file. Could you please try re-indexing the BAM file and using a new output directory to see if this resolves the problem?

[INFO] 7/7 Merge pileup VCF and full-alignment VCF
[INFO] Pileup variants processed in 11: 0
[INFO] Full-alignment variants processed in 11: 0
[INFO] Pileup variants processed in 12: 190
[INFO] Full-alignment variants processed in 12: 85
[INFO] Pileup variants processed in 17: 0
[INFO] Full-alignment variants processed in 17: 0
[INFO] Pileup variants processed in 13: 455
[INFO] Full-alignment variants processed in 13: 199
[INFO] Pileup variants processed in 20: 0
[INFO] Full-alignment variants processed in 20: 0
[INFO] Pileup variants processed in 4: 0
[INFO] Full-alignment variants processed in 4: 0
[INFO] Pileup variants processed in 7: 0
[INFO] Full-alignment variants processed in 7: 0
[INFO] Pileup variants processed in 14: 894
[INFO] Full-alignment variants processed in 14: 787
[INFO] Pileup variants processed in 8: 355
[INFO] Full-alignment variants processed in 8: 600
[INFO] Pileup variants processed in 3: 474
[INFO] Full-alignment variants processed in 3: 592
[INFO] Pileup variants processed in 1: 2461
[INFO] Full-alignment variants processed in 1: 2272
[INFO] Pileup variants processed in 10: 0
[INFO] Full-alignment variants processed in 10: 0
[INFO] Pileup variants processed in 18: 1356
[INFO] Full-alignment variants processed in 18: 1910
[INFO] Pileup variants processed in 15: 0
[INFO] Full-alignment variants processed in 15: 0
[INFO] Pileup variants processed in 19: 0
[INFO] Full-alignment variants processed in 19: 0
[INFO] Pileup variants processed in 6: 999
[INFO] Full-alignment variants processed in 6: 1209
[INFO] Pileup variants processed in 5: 0
[INFO] Full-alignment variants processed in 5: 0
[INFO] Pileup variants processed in 2: 0
[INFO] Full-alignment variants processed in 2: 0
[INFO] Pileup variants processed in 16: 0
[INFO] Full-alignment variants processed in 16: 0
[INFO] Pileup variants processed in 9: 0
[INFO] Full-alignment variants processed in 9: 0
[INFO] Pileup variants processed in 22: 0
[INFO] Full-alignment variants processed in 22: 0
[INFO] Pileup variants processed in X: 1193
Samvkes commented 10 months ago

Hi @Samvkes,

Thank you for providing the logs. Clair3 ran successfully in the conda environment, but it encountered issues when running in the singularity container. It appears that most contigs in the container produced empty pileup results, as indicated below. It is uncertain whether this issue is related to the outdated index file. Could you please try re-indexing the BAM file and using a new output directory to see if this resolves the problem?

[INFO] 7/7 Merge pileup VCF and full-alignment VCF
[INFO] Pileup variants processed in 11: 0
[INFO] Full-alignment variants processed in 11: 0
[INFO] Pileup variants processed in 12: 190
[INFO] Full-alignment variants processed in 12: 85
[INFO] Pileup variants processed in 17: 0
[INFO] Full-alignment variants processed in 17: 0
[INFO] Pileup variants processed in 13: 455
[INFO] Full-alignment variants processed in 13: 199
[INFO] Pileup variants processed in 20: 0
[INFO] Full-alignment variants processed in 20: 0
[INFO] Pileup variants processed in 4: 0
[INFO] Full-alignment variants processed in 4: 0
[INFO] Pileup variants processed in 7: 0
[INFO] Full-alignment variants processed in 7: 0
[INFO] Pileup variants processed in 14: 894
[INFO] Full-alignment variants processed in 14: 787
[INFO] Pileup variants processed in 8: 355
[INFO] Full-alignment variants processed in 8: 600
[INFO] Pileup variants processed in 3: 474
[INFO] Full-alignment variants processed in 3: 592
[INFO] Pileup variants processed in 1: 2461
[INFO] Full-alignment variants processed in 1: 2272
[INFO] Pileup variants processed in 10: 0
[INFO] Full-alignment variants processed in 10: 0
[INFO] Pileup variants processed in 18: 1356
[INFO] Full-alignment variants processed in 18: 1910
[INFO] Pileup variants processed in 15: 0
[INFO] Full-alignment variants processed in 15: 0
[INFO] Pileup variants processed in 19: 0
[INFO] Full-alignment variants processed in 19: 0
[INFO] Pileup variants processed in 6: 999
[INFO] Full-alignment variants processed in 6: 1209
[INFO] Pileup variants processed in 5: 0
[INFO] Full-alignment variants processed in 5: 0
[INFO] Pileup variants processed in 2: 0
[INFO] Full-alignment variants processed in 2: 0
[INFO] Pileup variants processed in 16: 0
[INFO] Full-alignment variants processed in 16: 0
[INFO] Pileup variants processed in 9: 0
[INFO] Full-alignment variants processed in 9: 0
[INFO] Pileup variants processed in 22: 0
[INFO] Full-alignment variants processed in 22: 0
[INFO] Pileup variants processed in X: 1193

I will try, but just to be clear: the same index-file was definetely used in both runs.