KlausVigo / phangorn

Phylogenetic analysis in R
http://klausvigo.github.io/phangorn/
203 stars 38 forks source link

iupac DNA seq #170

Closed nicolo-tellini closed 3 months ago

nicolo-tellini commented 3 months ago

Hello,

I am running the fun dist.hamming across samples on aligned fasta file. Because of high number of heterozygous positions one sample has replaced DNA bases with IUPAC code, so if a position is heterozygous T and G the position has a K.
Are phangorn's functions able to deal with that ?

thanks

KlausVigo commented 3 months ago

Hi @nicolo-tellini

Hello,

I am running the fun dist.hamming across samples on aligned fasta file. Because of high number of heterozygous positions one sample has replaced DNA bases with IUPAC code, so if a position is heterozygous T and G the position has a K. Are phangorn's functions able to deal with that ?

dist.hamming can read process IUPAC codes. The question is what you expect that the distance between a K and a T should be? Currently if you have an ambiguous state K and T the distance returned by dist.hamming is zero as the intersection would be T (like in parsimony). There is also a function dist.p which handles polymorphism more specifically. I added this function long time ago and Alastair Potts would know more about it.

Also a this package by Simon Joly might do what you want: https://github.com/simjoly/pofadinr. The functions might get incorporated into ape.

Kind regards, Klaus

thanks