AndersenLab / VCF-kit

VCF-kit: Assorted utilities for the variant call format
http://www.andersenlab.org
MIT License
122 stars 25 forks source link

calculate Tajima'D on haploid #28

Open AlexWanghaoming opened 4 years ago

AlexWanghaoming commented 4 years ago

Dear developers, My fungi strains are haploidy and I would like to calculate Tajima'D via vk tajima. When I read doc, it says that require vcf file be diploid sites and the result is mysterious. I call snp with bcftools: bcftools call --ploidy 1 Could you please help me how to solve it? Thanks, Alex

danielecook commented 4 years ago

Hi @AlexWanghaoming - I would recommend simply replacing your genotype calls from 0 or 1 to 0/0 and 1/1 in the VCF. Then you should be able to run VCF kit with a ploidy state of 1. I'm not sure whether this will affect your caculation - but I do not think it will.

Rohit-Satyam commented 12 months ago

Hi @danielecook, my VCF file contains genotype calls in 0/0 format. I got it from MalariaGen and now I was trying to calculate the Tajima's D (TD) for the two most polymorphic genes in Plasmodium (which is haploid) and yet I get negative TD values for them. I am not sure if the subcommand vk tajima is doing the right thing because highly polymorphic gene should have high TD values

Here are the subsetted VCF files for the same genes: File1 File2

PF3D7_0930300 Pf3D7_09_v3:1201305-1207576
PF3D7_1133400 Pf3D7_11_v3:1292966-1296696