flucoma / flucoma-core

Core algorithms and objects for the Fluid Corpus Manipulation Library
BSD 3-Clause "New" or "Revised" License
78 stars 15 forks source link

skmeans seems to only do 2 clusters #181

Open tremblap opened 2 years ago

tremblap commented 2 years ago

even with 2d circles data like in the Max helpfile, we get only 2 labels used in the end. Is there a bug in the code, or in my expectations of what it should do (4 slices of data)

Kmeans does return 4 clusters as requested. @g-roma can you look and see if it behaves how you understand the algo ?

g-roma commented 2 years ago

I can't check the Max help file, does it also happen in SC? Swapping SKMeans in the KMeans help file code works as expected for me: 4 clusters, but based on angle. In some runs you may end up with an empty cluster.

tremblap commented 2 years ago

It does happen in SC too:

b = FluidDataSet(s)
b.read(FluidFilesPath() ++ "../Data/circles.json")
b.print

c = FluidSKMeans(s,numClusters: 4)

d = FluidLabelSet(s)

// run this many times. always 2 classes full 2 empty
c.clear
c.fitPredict(b,d,{|x|x.postln})

d.dump{|i|e=i}
b.dump{|i|f=i}
// e.postln
// f.postln

// graphic evidence
~fp = FluidPlotter(bounds:Rect(200,200,600,600),dict:f).categories_(e);
g-roma commented 2 years ago

The data in circles.json does not really have 4 clusters in the angular direction, so I would not expect the algorithm to find them. The code below generates a set of points clustered by angle. The algorithm may still fail due to random initialization (especially with smaller datasets), but most of the time it finds them (substitute in the FluidKMeans help file).

~points = (4.collect{|i| 128.collect{ var angle = 0.1.rand + (i pi / 2); var r = 1.0.rand; [r angle.cos, r * angle.sin] }}).flatten(1);

tremblap commented 2 days ago

ok it happens that the data needs to be centred on 0 to behave quickly on our code. I'll add a note to the helpfile and add an example that makes it clear.