biodiv / anycluster

Server-side clustering of map markers for (Geo)Django
MIT License
106 stars 21 forks source link

number of database queries #4

Closed clime closed 10 years ago

clime commented 11 years ago

There is a significant performance hit for making SELECT for each cell in gridCluster and kmeansCluster. Have you thought about reducing it to just one (or a few) queries? I am not completely sure that it is possible but I feel it should be and it would improve performance greatly (especially if you have lots of cells). Have you thought about it? I am looking for a way to do it but I would like to hear from you first what you think.

biodiv commented 11 years ago

For the kmeans method, this should be possible and is an interesting idea. One would have to calculate the number of visible cells and then get the number of clusters with k*cellcount. After that, only one SELECT would be needed, targeting the current (grid)bounds instead of each grid cell. Furthermore, this would reduce the amount of times the distance cluster has to be run. I will give that a try. Thank you for this input.

For the gridCluster I currently don't know how the amount of SELECT could be reduced, but that does not mean it is not possible. If you (or anyone else) knows a solution it would be highly appreciated.

biodiv commented 10 years ago

I might have found a way querying the database only once by using a grid calculated by a postgis function: http://gis.stackexchange.com/questions/16374/how-to-create-a-regular-polygon-grid-in-postgis Hopefully I will find the time to test this.

biodiv commented 10 years ago

query amount reduced using temporary tables

clime commented 10 years ago

Good job. I can't test because i am on travels but good job.