HughCraig / GHAP

1 stars 0 forks source link

Spatial K-Means Clustering #336

Open BillPascoe opened 7 months ago

BillPascoe commented 7 months ago

Call PostGIS KMeans clustering functions for spatial clustering on Layers and on Search. Enable results as:

IanMcCrabb commented 7 months ago

Whole package om this one

BillPascoe commented 6 months ago

It looks like the 'Place ID' is the unique id in the database. It should be the TLCMap ID. Eg: for Poignnbah Point (1826) the id should be reported as t2652 not 9810.

BillPascoe commented 6 months ago

Include all the data for each record in the CSV download, and on the map, same as it would be on the normal maps. Since json is available to display the map, it should be easy to make it a download option too, so add that. Create the WSFeed button to provide these as WS feed. If it is not hard to add the KML option to download and WS Feed, also do that.

BillPascoe commented 6 months ago

If there is a very large amount of clusters, many of the colours for this are just white, eg: https://test-views.tlcmap.org/dev/collection-3d.html?load=https%3A%2F%2Ftest-ghap.tlcmap.org%2Flayers%2F112%2Fclusteranalysis%2Fdbscan%2Fjson%3Fdistance%3D0.0059%26minPoints%3D0

I assume this is because we ran out of colours from the colourblind safe palette.

When the colourblind safe palette runs out make the cluster colours either (you choose): a) a randomly selected Hex code. b) a hex code derived by some method that evenly spreads them across the colour spectrum, like if there are 60 points, then divide the maximum Hex value by 60, and then assign a colour to each cluster by adding that much to the previous one. Eg: FFFFFF / 3C - 44444, so first colour is #044444 and second colour is #088888 etc.

BillPascoe commented 6 months ago

I don't think the value for 'Within Radius (kms):' is working correctly. If I enter 5 clusters, and set 'Within Radius (kms)' to a low value, such as 0.2 (200m), 0.02 (20m) or 0.002 (2m) I always get 5 clusters, I think I should be seeing more than 5 clusters. This should break into 5 clusters but then also make sure than no cluster's radius is greater than, eg: 20m, and create more clusters if this is exceeded. But clearly the 5 clusters created have a radius bigger than 20m or 2m. The postgis manual says "max_radius, if set, will cause ST_ClusterKMeans to generate more clusters than k ensuring that no cluster in output has radius larger than max_radius. This is useful in reachability analysis." https://postgis.net/docs/ST_ClusterKMeans.html I think I am seeing a few issues with unit conversion, so perhaps it is a unit conversion problem? Maybe see if the results given by running this query directly in the database give different results. I may have misunderstood.

BillPascoe commented 5 months ago

The "Within Radius (kms)" parameter is not working. If I put any value in there, it returns a screen with no results at all, just headings. Eg: using layer https://test-ghap.tlcmap.org/layers/112

MufengNiu commented 5 months ago

Hi @BillPascoe , can you send the parameters you are using?

The max_radius is no longer supported in the k-mean Postgis functions, so the calculation is implemented by ourslelfs. If the distance between a place and the centroid exceed the max radius then the current cluster will be dropped.

10 clusters and within 4kms can get the data .

BillPascoe commented 5 months ago

Parameters I'm using are 5 clusters and 1km. I think I had a different understanding of what 'max radius' means. However, if it is not in the PostGIS functionality we can leave it out. But since you have added this feature, we can keep it as it is and document how it works. However, generate an error message that says why it did not return results, and provide link to GHAP Guide. We can discuss how it works at the next meeting so I can put it in the documentation. Do you mean that it: a) finds the clusters b) checks each cluster for any points outside the max_radius from the centroid of this cluster and removes them. or a) remove all points outside the radius from the centroid of the whole layer b) cluster all the remaining points ? The way I thought it would work is like this: a) cluster all the points, but make sure the radius of no cluster is greater than max_radius. That means that if the user said '5' clusters, but that would make 5 big clusters with a radius greater than max_radius, then the algorithm will make more than 5 clusters all with a radius smaller than max_radius.