Acellera / htmd

HTMD: Programming Environment for Molecular Discovery
https://software.acellera.com/docs/latest/htmd/index.html
Other
256 stars 58 forks source link

TODO enhancements #247

Closed giadefa closed 7 years ago

giadefa commented 7 years ago
  1. Implement the kinetic distance in TICA at 95% variance. So that we don't have to choose one parameter and the result is also better
  2. wrap our own kmean. The recommended number of cluster should be sqrt of number of frames (also in adaptive?)
  3. use kcenters as default in adaptive
stefdoerr commented 7 years ago
  1. is already done. If you don't pass a dimension to tica.project() it does that.
stefdoerr commented 7 years ago

But from my tests it worked quite badly. It gave hundreds of dimensions for my systems which made a very bad model.

j3mdamas commented 7 years ago

Just printed now the corresponding paper. The ITS plots with kinetic maps look like tICA with all dimensions, but a bit improved? For the model systems it looks very good, but for the molecular systems I need to read more. @stefdoerr bad how?

j3mdamas commented 7 years ago

@giadefa which paper is the one of the square root of the number of frames? is it this one?

stefdoerr commented 7 years ago

It's an old heuristic

https://www.quora.com/How-can-we-choose-a-good-K-for-K-means-clustering

ctrl+f square root

or google kmeans square root

On Thu, Feb 2, 2017 at 11:39 AM, João M. Damas notifications@github.com wrote:

@giadefa https://github.com/giadefa which paper is the one of the square root of the number of frames? is it this one?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/247#issuecomment-276923485, or mute the thread https://github.com/notifications/unsubscribe-auth/AHkVgoufCUQGuP5DFAJctzQtq09-fNrMks5rYbJ-gaJpZM4L070P .

j3mdamas commented 7 years ago

interesting. reminds me of the rules to choose bin-size when making histograms, where there's also the square-root rule: https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width.

stefdoerr commented 7 years ago

@giadefa For point 3 do we change the default from Kmeans to Kcenters and issue a warning (or no warning even)? Otherwise, unless we decide to do adaptive versions like the protocols I am running out of names for every change in adaptive.

Point 2 is it really necessary? What for exactly?

giadefa commented 7 years ago

Just change the default.

Point 2 it is not required but we might also want to compare the current heuristic for number of clusters with the sqrt one and eventually change that as well.

stefdoerr commented 7 years ago

1 and 3 are done. 2 I will make a modification to allow for arbitrary numClusters functions to be passed and leave it for Adria to test it on NTL9