Open tremblap opened 2 years ago
I can't check the Max help file, does it also happen in SC? Swapping SKMeans in the KMeans help file code works as expected for me: 4 clusters, but based on angle. In some runs you may end up with an empty cluster.
It does happen in SC too:
b = FluidDataSet(s)
b.read(FluidFilesPath() ++ "../Data/circles.json")
b.print
c = FluidSKMeans(s,numClusters: 4)
d = FluidLabelSet(s)
// run this many times. always 2 classes full 2 empty
c.clear
c.fitPredict(b,d,{|x|x.postln})
d.dump{|i|e=i}
b.dump{|i|f=i}
// e.postln
// f.postln
// graphic evidence
~fp = FluidPlotter(bounds:Rect(200,200,600,600),dict:f).categories_(e);
The data in circles.json does not really have 4 clusters in the angular direction, so I would not expect the algorithm to find them. The code below generates a set of points clustered by angle. The algorithm may still fail due to random initialization (especially with smaller datasets), but most of the time it finds them (substitute in the FluidKMeans help file).
~points = (4.collect{|i| 128.collect{ var angle = 0.1.rand + (i pi / 2); var r = 1.0.rand; [r angle.cos, r * angle.sin] }}).flatten(1);
ok it happens that the data needs to be centred on 0 to behave quickly on our code. I'll add a note to the helpfile and add an example that makes it clear.
even with 2d circles data like in the Max helpfile, we get only 2 labels used in the end. Is there a bug in the code, or in my expectations of what it should do (4 slices of data)
Kmeans does return 4 clusters as requested. @g-roma can you look and see if it behaves how you understand the algo ?