edgardomortiz / vcf2phylip

Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis
GNU General Public License v3.0
294 stars 85 forks source link

alt allele assignment for heterozygous SNP site instead of IUPAC codes #23

Closed kumarsaurabh20 closed 4 years ago

kumarsaurabh20 commented 4 years ago

Hi,

is it possible with the current script to assign alternate allele for all heterozygous sites, instead of IUPAC codes? With the default, while trying to translate the converted phylip file, many stop codons are appearing in the alignment because of IUPAC codes and its a painful process to correct 1-2 million SNPs. Any suggestions? I am sure it won't be difficult to add this functionality as an option.

Many thanks.

Kumar

edgardomortiz commented 4 years ago

That sounds feasible, however I will be able to work on it in a couple of days. I will keep you updated.

kumarsaurabh20 commented 4 years ago

Many thanks Edgardo.

edgardomortiz commented 4 years ago

Hi, I was working on the code when I found an edge case: REF=A, and ALT=C,T, then I find some heterozygote genotypes CT which are both ALT, what behavior would you propose in that case? picking one of the ALT randomly would be acceptable?, in that case why not choose also randomly between REF and ALT...

edgardomortiz commented 4 years ago

I guess the most logical option is to randomly resolve the genotype when is not homozygous, especially when working with polyploids.

I went ahead and pushed an update with a new option -r or --resolve-IUPAC that chooses a nucleotide from a heterozygous genotype at random to avoid IUPAC ambiguities in the output matrices.

Check it out and tell me if it fits your purposes ...