evogytis / ebolaGuinea2014

Phylogenetic analysis of 2014 Ebolavirus outbreak in Guinea.
4 stars 4 forks source link

why they are many "U" in the sequences files? #1

Closed ghost closed 6 years ago

ghost commented 6 years ago

why they are man "U" in the sequences files? @evogytis, this should be "T", unless you are direct sequencing the RNA.

evogytis commented 6 years ago

Phylogenetic packages usually don't care if sequences use Us or Ts.

rambaut commented 6 years ago

Some groups use Us for RNA viruses because they are RNA genomes. Doesn't really make sense for negative sense single stranded RNA viruses because this is the sequence is the +ve strand which will only exist as mRNA but hey, why not? As Gytis says, phylogenetics programs generally treat U and T as synonymous (be careful though, some of the older versions of BEAST2 treated Us as missing data).

ghost commented 6 years ago

@evogytis @rambaut, not only for BEAST2, BEAST1 also treat Us and Ts different, for example, is the nucleotide substitution model is only Ts not Us, is that true? How about the sequence files that contains both the Ts and Us?

rambaut commented 6 years ago

No, BEAST1 (and the latest BEAST2) treat Us and Ts as identical (when they are loaded into memory, all Us are changed to Ts).

ghost commented 6 years ago

Thanks for your kindly reply. So what about other softwares, I saw you also used PhyML and Mrbayes, are they also automatically changed the Us to Ts?

rambaut commented 6 years ago

I have no idea - these are not my programs. But I suspect so.

ghost commented 6 years ago

Thanks, I will ask the the authors of the programs. :). BTW, now no one use BSSVS for geography analysis, and transform to MTT, why are the BSSVS are so biased?

ghost commented 6 years ago

@rambaut, how the BEAST deal with ambiguity base, will it deleted all the columns that contains ambiguity, thanks.