Closed a2g1n closed 4 years ago
Hello, My guess is that those excluded are deletions. If possible email me a few thousand lines (~10K) of your VCF to be sure or to fix the issue.
Edgardo
Hi Edgardo Thanks for your mail. I had filtered my variants just for SNPs, so I doubt its deletions. Anyway I have attached part of the vcf. Thanks! Also is it possible to get the vcf lines that the script excluded (or included)? It would be helpful for downstream analysis. For example, I would like to tell how many genes are covered in the RaxML phylogeny tree. But I can’t estimate it without knowing which vcf lines were included.
Regards Abhinay
On 20 Oct 2019, at 09:17, Edgardo M. Ortiz notifications@github.com wrote:
Hello, My guess is that those excluded are deletions. If possible email me a few thousand lines (~10K) of your VCF to be sure or to fix the issue.
Edgardo
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/edgardomortiz/vcf2phylip/issues/14?email_source=notifications&email_token=ANRERGRYI2HOEQKVDZ6NH33QPQH3LA5CNFSM4JCRIO6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBYE6DQ#issuecomment-544231182, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANRERGQ76V2VOHNTAMGG2VTQPQH3LANCNFSM4JCRIO6A.
Hi again, I don't see any attachment...
Now? test.vcf.gz
Hi again @edgardomortiz I just tried with a nonsensical value of -m -1000 and it does not filter out any SNPs. So it is definitely dependent on -m I think?
I had this problem too and it was solved by including a nonsensical value of -m.
A side question: is it possible to get it to include MNPs? I'd like to keep these in my fasta.
Hi again @edgardomortiz I just tried with a nonsensical value of -m -1000 and it does not filter out any SNPs. So it is definitely dependent on -m I think?
Hi, I think I fixed the bug, thanks for finding it. Could you re-clone the repository and re-run the script on your files to see if it behaves correctly now?
Edgardo
I had this problem too and it was solved by including a nonsensical value of -m.
A side question: is it possible to get it to include MNPs? I'd like to keep these in my fasta.
@ke-crawford the problem with MNPs is that even though they are usually the same length across samples (when you can assume that they are aligned) there are also cases where they come unaligned or have different lengths (for example I saw that many times coming from freebayes). The solution in this case I to normalize allele variant representation with something like vcfallelicprimitives
, check here: https://github.com/ekg/freebayes#normalizing-variant-representation. I other words, to convert all MNPs to SNPs.
Edgardo
Closing the issue, @ke-crawford feel free to re-open
Hi I have a multi-sample vcf which I have filtered to retain reference and SNP calls ONLY if at least 25 samples out of 31 total samples have non-missing data. However, when I convert that into phylip using your script, genotypes are still being excluded when they should not be. Or am I understanding the -m parameter wrong? I also tried -m 0 and still facing the same problem. Is there any way to see the excluded genotypes to troubleshoot this?
vcf2phylip.py -i RMPs.vcf -m 4
Total of genotypes processed: 6372167 Genotypes excluded because they exceeded the amount of missing data allowed: 875118 Genotypes that passed missing data filter but were excluded for not being SNPs: 0 SNPs that passed the filters: 5497049vcf2phylip.py -i RMPs.vcf -m 0
Total of genotypes processed: 6372167 Genotypes excluded because they exceeded the amount of missing data allowed: 810339 Genotypes that passed missing data filter but were excluded for not being SNPs: 0 SNPs that passed the filters: 5561828Thanks.