emmanuelparadis / pegas

Population and Evolutionary Genetics Analysis System
GNU General Public License v2.0
27 stars 10 forks source link

Missing data Handling #45

Closed jcgrenier closed 4 years ago

jcgrenier commented 4 years ago

Hello @emmanuelparadis ,

Thanks for that tool! It's getting pretty useful for our lab these days! We are dealing with some issues regarding missing data in some of our sequences. How should these be dealt with? Should we put an "N" or another symbol?

Thanks a lot! JC

emmanuelparadis commented 4 years ago

Hi JC, Thanks for the appreciation! If there are missing data in DNA sequences, then they should be represented with the standard IUPAC ambiguity code (this is what is done by all sequencing/assembling technologies): they'll then be treated accordingly by functions in pegas and ape. If you want need to extract the unique sequences, please pay attention to the options of the function haplotype.DNAbin (see ?haplotype.DNAbin) which help to handle the specific case of leading/trailing alignment gaps (usually due to short sequences aligned with longer ones). Best, Emmanuel

jcgrenier commented 4 years ago

Great! Thanks for the clear reply!

Have a great day! JC