ACEnglish / truvari

Structural variant toolkit for VCFs
MIT License
323 stars 48 forks source link

How to handle the "./." genotype in the merged project-level VCF? #223

Closed jxcao98 closed 3 months ago

jxcao98 commented 3 months ago

Hi,

I am currently utilizing Manta for structural variant calling and subsequently processing the results with Bcftools merge and Truvari collapse to generate a project-level VCF (pVCF). During this process, I encountered an issue where the pVCF file displays a substantial number of ./. genotypes, which led to a high missing rate for some structural variants.

It is my understanding that the ./. genotypes likely indicate that the samples do not carry the variants. However, I believe it is not accurate to categorize these genotypes merely as homozygous references (0/0), particularly for those truly missing due to sequencing errors.

I am considering the use of genotyping tools like Graphyper or SVTyper to address this issue. However, these tools are computationally expensive, and I noticed they were not mentioned in your recent publication in Genome Biology (https://doi.org/10.1186/s13059-022-02840-6). Could you please provide insights into how you might address this issue in your studies? In additionally, the FTP link (ftp://ftp.hgsc.bcm.edu/Software/Truvari/3.1/population_vcfs) provided for downloading the pVCF from your article appears to be inactive, is there any alternative download option?

Best regards, Jixin

ACEnglish commented 3 months ago

Hello,

As you said, the missing (./.) genotypes could be assumed reference homozygous by using bcftools merge -0, however that isn't ideal. Performing a genotyping analysis on the merged/collapsed SVs could help, but is computationally expensive (as you know).

Unfortunately, for short reads those are your options: high missingness, assumed reference, or expensive analysis. I've been creating another tool that can genotype many SVs more cheaply (kanpig), but it requires long-reads as input, so I don't think you'll find it helpful.

I'll follow up with a new link to the VCFs hosted elsewhere. I'm not sure why that ftp link is no longer valid, so I'll just put the data somewhere more reliable.

Have a great day, ~/Adam English

ACEnglish commented 3 months ago

You can find the VCFs here: https://zenodo.org/records/13306816