KolmogorovLab / Severus

A tool for somatic structural variant calling using long reads
Other
105 stars 5 forks source link

How to choose phased.vcf ? #31

Open DayTimeMouse opened 2 weeks ago

DayTimeMouse commented 2 weeks ago

Hi,

I am confused about --phasing-vcf phased.vcf, I don't know which phased.vcf should I choose.

severus --target-bam phased_tumor.bam --control-bam phased_normal.bam --out-dir severus_out \
    -t 16 --phasing-vcf phased.vcf --vntr-bed ./vntrs/human_GRCh38_no_alt_analysis_set.trf.bed

I used Clair3 to get phased_merge_output.vcf.gz for normal and tumor, seperately. But severus only need one phased.vcffile.

Clair3 for normal: the output is phased_merge_output.vcf.gz

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:latest \
  /opt/bin/run_clair3.sh \
  --bam_fn=${INPUT_DIR}/normal.bam \    
  --ref_fn=${INPUT_DIR}/ref.fa \       
  --threads=${THREADS} \               
  --platform="hifi" \                   
  --model_path="/opt/models/${MODEL_NAME}" \
  --output=${OUTPUT_DIR} \
  --enable_phasing \
  --longphase_for_phasing

Clair3 for tumor: the output is phased_merge_output.vcf.gz

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:latest \
  /opt/bin/run_clair3.sh \
  --bam_fn=${INPUT_DIR}/tumor.bam \    
  --ref_fn=${INPUT_DIR}/ref.fa \       
  --threads=${THREADS} \               
  --platform="hifi" \                   
  --model_path="/opt/models/${MODEL_NAME}" \
  --output=${OUTPUT_DIR} \
  --enable_phasing \
  --longphase_for_phasing

How to choose/get --phasing-vcf phased.vcf ?

# To run HiPhase with DV or Clair3 output

    hiphase --bam tumor.bam\
        --vcf phased_merge_output.vcf.gz\
        --output-vcf tumor_hifi_hiphase.vcf.gz\
        --reference ref.fa --threads 16 --ignore-read-groups

Should I run HiPhase with Clair3 output(phased_merge_output.vcf.gz) to get tumor_hifi_hiphase.vcf.gz as --phasing-vcf phased.vcf?

Best regards.

aysegokce commented 2 weeks ago

Hello @DayTimeMouse, Severus expects both bam files to be phased with the same vcf file. We suggest using normal phased vcf to haplotag both normal and tumor. In our experience, it provides more stable and accurate phasing since the phasing tools are not designed to handle CNAs in the tumor. Unfortunately, hiphase does not do only haplotagging. So we run hiphase with normal bam (without tagging), then run whatshap to haplotag both normal and tumor with the phased vcf, then run severus with the normal phased vcf.

Best Ayse