brettc / partitionfinder

PartitionFinder discovers optimal partitioning schemes for DNA sequences.
Other
61 stars 44 forks source link

add support for amino acid alignments with iterative k-means #37

Closed pbfrandsen closed 9 years ago

pbfrandsen commented 9 years ago

It might be a good idea to add support for amino acid alignments using iterative k-means. I think most of it works already, we just need to add support for the estimation of amino acid site rates. @cmayer pointed out that since there are 20 character states in amino acids (rather than 4) that there will likely be a greater amount of conflict leading to some very small rates. I'm not sure what effect this will have and it will have to be tested to see if it works. I still think it would be worth adding since we should be able to implement it with minimal extra effort.

roblanf commented 9 years ago

Sounds good.

Note that if there's too much conflict, we can reduce the alignment into classes of amino acid rather than actual amino acids. There are some nice papers from Galtier's lab describing how one can do this. But let's try it on raw amino acids first.

On 6 February 2015 at 02:27, Paul Frandsen notifications@github.com wrote:

It might be a good idea to add support for amino acid alignments using iterative k-means. I think most of it works already, we just need to add support for the estimation of amino acid site rates. @cmayer https://github.com/cmayer pointed out that since there are 20 character states in amino acids (rather than 4) that there will likely be a greater amount of conflict leading to some very small rates. I'm not sure what effect this will have and it will have to be tested to see if it works. I still think it would be worth adding since we should be able to implement it with minimal extra effort.

— Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/37.

Rob Lanfear School of Biological Sciences, Macquarie University, Sydney

phone: +61 (0)2 9850 8204

www.robertlanfear.com

pbfrandsen commented 9 years ago

Completed using site entropies rather than TIGER site rates. This is for 2 reasons:

  1. It is a lot faster
  2. There were legitimate concerns as to how useful TIGER site rates would be for amino acids.