Closed bastian-wur closed 4 years ago
At 1e+300 I am not at all surprised that k-means fails to produce results. You simply exceed the range of floating point capabilities. K-means by definition minimizes squared errors. The square of 1e+300 busts the floating point range, and the overflow likely causes these NaN and infinite values to appear.
Closing as won't fix: supporting arbitrary precision would ruin performance, so this is not going to happen. Instead scale your data; either use logspace (with values at this magnitude, this may or may not be more meaningful), or simply scale the data; e.g. by 1e-300.
Hi everyone,
I'm right now trying to cluster a matrix, and did some back and forth on what I did. The values in the matrix are pretty big, biggest is 10e+300, and the matrix is also pretty dense. I did clustering with k-means, which also produced results, but all internal cluster evaluation algorithms failed to produce anything. This is a result from k-means with k=4
In the meantime I do the clustering only on the exponents (so 10e+300 converts to 300), and I do now get useful outputs. So... no idea what is causing this, but I guess something should warn the user.