Open pnorman opened 8 years ago
Having this function in the users' databases could be useful to create point overviews by clustering points spatially.
Like here: http://gis.stackexchange.com/questions/11567/spatial-clustering-with-postgis
Did a little clustering experiment here and sadly it seems to slow for this purpose.
Clustering isn't something you'd normally run interactively for display, but for analysis. With 4.1m points I get times of 19s for 10 buckets and 137s for 100 buckets.
The common heuristic for kmeans is O(nkdi), where n is the number of d-dimensional vectors, k the number of clusters and i the number of iterations needed until convergence.
It's not a tool for everything, but all clustering except simple grid clustering will have the same problems
Jenk's natural breaks are a special case of kmeans clustering in 1d. The kmeans-postgresql is a C implementation of kmeans.
Benchmarking shows kmeans-postgresql to be orders of magnitude faster than the plpgsql impelementation currently in cartodb-postgresql
Using kmeans internally for the math would be a substantial speed increase.