Open jsabeeler opened 10 years ago
This is the unfortunate consequence of the lack of a standard in the VCF format for reporting allele depths. The VCF parser that GEMINI uses currently supports ref and alt allele depths reported by GATK and FreeBayes (https://github.com/arq5x/cyvcf/blob/master/cyvcf/parser.pyx#L252-L308). We have not implemented support for VarScan. I am a bit fearful of trying to support the strategies employed by every variant caller (and I have raised this issue with the folks that define the VCF standard). I am about to leave for vacation but can try to tackle this for you when I return in a couple of weeks. Sorry for the trouble - it is a big problem in the VCF standard.
Aaron
Thanks for the explanation Aaron. I am starting to work with VCF files and hadn't realized the VCF format's lack of standardization in this area. I understand your reservations about supporting the VCF output for multiple variant callers. This is not a critical issue for me, but would be a nice feature if it didn't require too much time to add.
Best, Scott
No sweat. If you need it soon, you could update the code I referenced in CyVCF to support VarScan's convention. All you would need to to do is mimic the logic already there and then install that version of CyVCF on your system so that GEMINI uses it. I would glad incorporate your changes. Otherwise, I can try to knock it out when I return.
I'll give it a try. I'm pretty new to Python, but this could be a good learning experience. As a quick workaround I ended up melting the VCF, extracting the RD and AD genotype tags, and converting from a long to wide table that I could use to annotate the GEMINI ouput.
Hi,
I am having an issue with the extraction of the RD and AD genotype tags in my input VCF file to the 'gt_ref_depths' and 'gt_alt_depths' columns of my gemini database. I am using gemini (v0.8.0) on Ubuntu Server 12.04. The multisample VCF was generated using VarScan v2.3.7 and annotated with VEP.
Below is an example of the query I am running.
Below is the output for the above query. The 'gts', 'gt_depths', and 'gt_quals' columns have been correctly imported from my input VCF file into the gemini database. The 'gt_ref_depths' and 'gt_alt_depths' columns in my gemini database are annotated with '-1' instead of the the corresponding values in the RD and AD genotype tags in my input VCF file.
Below is the first 30 lines from my input VCF. Values for RD and AD genotype tags are present.
Any assistance would be appreciated. Thanks.