covartech / PRT

Pattern Recognition Toolbox for MATLAB
http://covartech.github.io/
MIT License
144 stars 70 forks source link

Proposal - All clustering objects sort their cluster indices by increasing first dim #69

Closed peterTorrione closed 6 years ago

peterTorrione commented 6 years ago

Clustering objects for the most part return un-sorted cluster centers, but this makes interpretation difficult.

For most clustering algorithms, there's no natural ordering of the data, but we can enforce a simple sorting, e.g., the first dim of the cluster centers are increasing.

If no complaints, I may implement in some clustering algorithms.

kennethmorton commented 6 years ago

I have no complaints. It's not always straight forward to resort the internals of specific clustering algorithms, but for many it should be relatively straight forward. This isn't something we can do automatically or something that we can assert that all clusters do, but we can do it as a best practice for commonly used clusters like k-means and a few others.

peterTorrione commented 6 years ago

This is implemented in https://github.com/covartech/PRT/commit/00db823c231fc454f6b71659338b4d35d425449e

for all the clustering algorithms we have that I can run (see the recent two issues #70 and #71 for details).

It works -

%%
ds = prtDataGenBimodal;
ds = rt(prtPreProcZmuv,ds);
clusterers = {prtClusterDpMeans('lambda',1);
    prtClusterGmm;
    prtClusterKmeans;
%     prtClusterKmodes;
%     prtClusterMeanShift;
    prtClusterMeanShiftEuclidean;
    prtClusterSpectralKmeans;
    prtClusterSphericalKmeans;};

[mm,nn] = prtUtilGetSubplotDimensions(length(clusterers));

for cEnum = cvrEnumerate(clusterers)
    c{i} = cEnum.value.train(ds);
    subplot(nn,mm,cEnum.index)
    plot(c{i});
end

The clusters are in the right order - blue, red, green, along the 1st dimension

clusterers_sorted