Closed giadefa closed 7 years ago
tica.project()
it does that.But from my tests it worked quite badly. It gave hundreds of dimensions for my systems which made a very bad model.
Just printed now the corresponding paper. The ITS plots with kinetic maps look like tICA with all dimensions, but a bit improved? For the model systems it looks very good, but for the molecular systems I need to read more. @stefdoerr bad how?
@giadefa which paper is the one of the square root of the number of frames? is it this one?
It's an old heuristic
https://www.quora.com/How-can-we-choose-a-good-K-for-K-means-clustering
ctrl+f square root
or google kmeans square root
On Thu, Feb 2, 2017 at 11:39 AM, João M. Damas notifications@github.com wrote:
@giadefa https://github.com/giadefa which paper is the one of the square root of the number of frames? is it this one?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Acellera/htmd/issues/247#issuecomment-276923485, or mute the thread https://github.com/notifications/unsubscribe-auth/AHkVgoufCUQGuP5DFAJctzQtq09-fNrMks5rYbJ-gaJpZM4L070P .
interesting. reminds me of the rules to choose bin-size when making histograms, where there's also the square-root rule: https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width.
@giadefa For point 3 do we change the default from Kmeans to Kcenters and issue a warning (or no warning even)? Otherwise, unless we decide to do adaptive versions like the protocols I am running out of names for every change in adaptive.
Point 2 is it really necessary? What for exactly?
Just change the default.
Point 2 it is not required but we might also want to compare the current heuristic for number of clusters with the sqrt one and eventually change that as well.
1 and 3 are done. 2 I will make a modification to allow for arbitrary numClusters functions to be passed and leave it for Adria to test it on NTL9