HKU-BAL / ClairS-TO

ClairS-TO - a deep-learning method for tumor-only somatic variant calling
BSD 3-Clause "New" or "Revised" License
50 stars 3 forks source link

DP seems downsampling in ClairS-TO #16

Closed ErminZ closed 3 months ago

ErminZ commented 3 months ago

Hello,

Thank you for developing these great long-read small variants callers! We have some Cas9 edited, PCR-based Nanopore samples (Dorado duplex basecalling dna_r10.4.1_e8.2_5khz_stereo@v1.2 and dna_r10.4.1_e8.2_400bps_sup@v4.3.0), and want to quantify the editing percentage.

The sequencing depth (DP) is 372 k reads, and the mutation type should be T->TA at chr4:123 (based on BAM file IGV visualization, and previous edited samples). I have tried the following three mutation calling programs:

  1. Run Clair3 with an old ont_guppy5 model. The mutation type is T deletion which is inconsistent with IGV visualization T->TA.

chr4 123 . T . 10.36 RefCall P GT:GQ:DP:AD:AF 0/0:10:372419:186689:0.5013

hkubal/clair3:v1.0.8 /opt/bin/run_clair3.sh \
--bam_fn=/input.bam \
--ref_fn=/hg38.fna --threads=8 \
--platform="ont" --model_path="/opt/models/ont_guppy5" --output=/input/clair3/ \
--var_pct_full=1 --var_pct_phasing=1 --ref_pct_full=1
  1. Run Clair3 with the newest model "r1041_e82_400bps_sup_v430". No INDEL at chr14:123 detected. So I assume it's not germline so try solution 3.

  2. Run ClairS-TO. The DP is 7071, which is different from 372k reads, so it's hard to quantify the editing percentage.

chr4 123 . T TA 15.2734 PASS FAU=21;FCU=0;FGU=13;FTU=3436;RAU=1;RCU=0;RGU=0;RTU=113;SB=0.08341 GT:GQ:DP:AF:AD:AU:CU:GU:TU 0/1:15:7071:0.3793:3549,2682:22:0:13:3549

  hkubal/clairs-to:v0.2.0 /opt/bin/run_clairs_to \
  --tumor_bam_fn /input.bam \
  --ref_fn /hg38.fna \
  --threads 8 \
  --platform ont_r10_dorado_sup_4khz \
  --output_dir /input/clairs_to/

Please let me know which caller and parameters are best suitable for the PCR-based editing sample for editing performance evaluation. Thank you!

Best, Ermin

JasonCLEI commented 3 months ago

Hi, @ErminZ

Thanks a lot for your interest. "DP seems downsampling in ClairS-TO" should be caused by that the default max depth of samtools mpileup is 8000, and the depth of your data has far exceeded this threshold. We have added an option --bam_mplp_set_maxcnt to manually set this threshold and released a new Docker image. You could rerun the program with your PCR-based editing sample with the latest image of ClairS-TO by adding an option --bam_mplp_set_maxcnt 1000000 and check if DP is normal.

Lei