annoviko / pyclustering

pyclustering is a Python, C++ data mining library.
https://pyclustering.github.io/
BSD 3-Clause "New" or "Revised" License
1.16k stars 249 forks source link

xmeans does not agree to paper? #692

Open kno10 opened 1 year ago

kno10 commented 1 year ago

The last term, p * 0.5 * log(N), should be in the sum only once IMHO. It is in the top BIC equation (j is the model index, not the cluster index), not in the l(Dn) equation where n is the cluster index) in https://web.cs.dal.ca/~shepherd/courses/csci6403/clustering/xmeans.pdf No guarantees that everything else is fine.

I also rename sigma_sqrt to sigma_sq because it is supposed to be sigma square, not square root.

Note that if sigma_multiplier = float('-inf'), the result will always be infinity, won't it?