Closed bjarnebartlett closed 9 months ago
Hi Bjarne,
There is no formatting mistake, all your genotypes are empty (if you open the vcf you will only see ./.
where the genotypes should be). This must be some processing mistake during GATK, for example see this:
https://gatk.broadinstitute.org/hc/en-us/community/posts/360060957571-Empty-vcf-after-GenotypeVCFs-when-combining-already-genotyped-samples
I hope it helps
Edgardo
Hello,
Thank you very much for your help. I am revisiting this project and I took your suggestion to look at the vcf files, I have generated a merged VCF that isn't empty. I generated this VCF using GATK and merged it using BCFtools -- I attached it for you to verify. I am now getting the error below -- I read through the repository and couldn't figure out what KeyError: 'K' might be.
Cheers!
Bjarne
`Converting file 'allbcf.vcf':
Number of samples in VCF: 395
Traceback (most recent call last):
File "/mnt/md0/projects/Brettanomyces/Brett_Analysis_All_2023BB/TreeBuild_11_28_23_BB/vcf2phylip.py", line 502, in
Sorry for the delay, I took a look at your file, you have ambiguities (the K means G or T) in your reference which is very atypical but I can modify the code to skip this kind of SNPs:
The bigger problem I see now is that you have
I would recommend removing those
Aloha,
For some reason all my SNPs were removed after converting my vcf file. Processed on a MacOS system, the command I used is "python vcf2phylip.py -i all.vcf".
The result is: "Number of samples in VCF: 74 Total of genotypes processed: 11576930 Genotypes excluded because they exceeded the amount of missing data allowed: 11576930 Genotypes that passed missing data filter but were excluded for being MNPs: 0 SNPs that passed the filters: 0"
I used GATK to create this merged VCF file, and I suspect there's a formatting issue. To check this, I have included the first 1000 rows of the merged VCF as an attachment.
How can I fix this? Mahalo in advance for any help you can give. vcf1000.txt
-Bjarne