PacificBiosciences / HiFiCNV

Copy number variant caller and depth visualization utility for PacBio HiFi reads
Other
39 stars 4 forks source link

Variation Type Results #10

Closed ghost closed 1 year ago

ghost commented 1 year ago

(1)HiFiCNV detected significantly fewer DEL/DUP variant sites than sniffles

image

(2)When calculating the same mutation sites of sniffles and HiFiCNV, it was found that multiple sites of sniffles correspond to the same site of HiFiCNV,Is it reasonable to take 0.9 for reads overlap?

image

(3)reads overlap=0.9 Based on experience,HiFiCNV common variation point is 16(The total number is 613),Sniffles2 common variation point is 36(The total number is 10519),Are the results reasonable?

                if int(hvistart) < int(svistart) and int(hviend)<int(sviend) and int(hviend)>int(svistart):
                    if round(float(int(hviend)-int(svistart))/float(int(sviend)-int(hvistart)),2) >0.90:
                        ncom+=1
                elif int(svistart)<int(hvistart) and int(sviend)<int(hviend) and int(hvistart)<int(sviend):
                    if round(float(int(sviend)-int(hvistart))/float(int(hviend)-int(svistart)),2) >0.90:
                        ncom+=1
                elif int(hvistart)<int(svistart) and int(sviend)<int(hviend):
                    if round(float(int(sviend)-int(svistart))/float(int(hviend)-int(hvistart)),2) >0.90:
                        ncom+=1
                elif int(svistart)<int(hvistart) and int(hviend)<int(sviend):
                    if round(float(int(hviend)-int(hvistart))/float(int(sviend)-int(svistart)),2) >0.90:
                        ncom+=1
                elif int(svistart)==int(hvistart):
                    if int(hviend)==int(sviend):
                        ncom+=1
                    elif int(hviend)>int(sviend):
                        if round(float(int(sviend)-int(hvistart))/float(int(hviend)-int(svistart)),2) >0.90:
                            ncom+=1
                    elif int(hviend)<int(sviend):
                        if round(float(int(hviend)-int(hvistart))/float(int(sviend)-int(svistart)),2) >0.90:
                            ncom+=1
                elif int(sviend)==int(hviend):
                    if int(svistart)>int(hvistart):
                        if round(float(int(sviend)-int(svistart))/float(int(hviend)-int(hvistart)),2) >0.90:
                            ncom+=1
                    if int(svistart)<int(hvistart):
                        if round(float(int(hviend)-int(hvistart))/float(int(sviend)-int(svistart)),2) >0.90:
                            ncom+=1

the code:

./HiFiCNV/hificnv-v0.1.3-x86_64-unknown-linux-gnu/hificnv \
        --bam ./5mc.sort.bam \
        --maf ./5mc.vcf.gz \
        --ref human_g1k_v37_decoy.fasta \
        --exclude cnv_exclusion_regions.hg19.bed.gz \
        --expected-cn female_expected_cn.hg19.bed \
        --threads 5 \
        --output-prefix hificnv_test
./Sniffles/Sniffles-master/bin/sniffles-core-1.0.12/sniffles \
        --min_length 50 --min_support 2 --threads 50 \
        --mapped_reads 5mc.sort.bam \
        --vcf sniffles.vcf
holtjma commented 1 year ago

This is largely what I would expect. The main reason is that sniffles is a structural variant (SV) caller whereas HiFiCNV is a copy number variant (CNV). While there is some overlap between SVs and CNVs, the tooling tends to be looking for one of two fundamental signatures: break ends for SVs or read depth for CNVs. HiFiCNV was designed to complement SV callers like pbsv or sniffles, not replace it. As for the individual points:

  1. Yes, this should pretty much always be true.
  1. I'm not entirely sure what I'm looking at in the image. I think it is a list of calls from sniffles (left) and some overlapping calls from HiFiCNV (right). Assuming that's true, this would appear to be an area where the break end signature is present but not particularly clean, leading to multiple near-identical calls from sniffles. As for the 0.9 ratio, that may be reasonable. Software like Truvari will default to 0.7, and I've also used 0.5 for loose thresholds. It all depends on your purpose.
  2. I'm not sure what "common variation point" term is referring to. Are these just the number of shared calls? If so, that would sound reasonable based on the information I shared for response (1).

Hope this helps, let me know if you have further questions / clarifications!