epi2me-labs / wf-human-variation

Other
87 stars 41 forks source link

How is SVTYPE calculated in html report? #106

Closed imdanique closed 8 months ago

imdanique commented 8 months ago

Ask away!

Calculation of SVTYPE in the HTML report does not match the manual calculation in BCFTools. I'm wondering if I'm calculating it correctly. For example, in my sample's HTML report, the INS count is 10,562. However, when I calculate it manually using the following command:

bcftools view -i 'SVTYPE="INS"' sample.vcf.gz | grep -vc "#"

The count is 10,744. I've checked the source function that calculates SVTYPE but couldn't determine why it differs from my calculation

SamStudio8 commented 8 months ago

Hi @imdanique! Your command is correct, the report summarises variants on Chr 1..22,X,Y. The discrepancy you're observing is the report excluding counts for decoy sequences included with the reference. You should hopefully be able to confirm this with something like the following:

bcftools view -i 'SVTYPE="INS"' sample.vcf.gz | grep -c '^chr[0-9XY]*\s'
imdanique commented 8 months ago

@SamStudio8 Thanks for the help! Indeed, I've forgotten to filter out all the unplaced contigs. Now the numbers match