Closed lin-d-hop closed 2 months ago
So I have done one bout of profiling already. See https://github.com/DigitalCommons/mykomap/issues/249#issuecomment-2154588535
Conclusion from that is mainly that a significant amount of the time goes adding pins to the map, even though they are clustered using the Leaflet.markercluster library, which reduces the amount of DOM twiddling that needs to do.
This article on optimising Leaflet suggests this could be speeded up by using the supercluster library instead. Which sounds like a good thing to try! But I would guess this will require a reasonable, possibly large amount of replumbing - knowing for sure requires just trying it.
On top of this, there is the work in #254. We noticed that the time to load the data itself is pretty quick - order of hundreds to thousands of milliseconds (AKA seconds). Then there's a long delay whilst the map's progress spinner twirls. Then the map appears. However, with 500k pins, the map is so slow to pan and zoom as to be unusable.
Which means, the clustering is not the final word - it's possible the map will be unusably slow even if we make the pin creation/clustering step fast. And there are currently no solutions for that, except perhaps more profiling and optimising. But if the slowness is entirely in Leaflet, we may need to try something like WebGL tiles or something more radically different.
I'm having a bash at wiring Supercluster in place of MarkerClusterGroup.
Supercluster's API is quite a lot simpler than MCG's, but this means some things don't map obviously (e.g. no way to trigger a pan and zoom to a marker, and spiderifying pins seems not to be an option)
Another difference is that MCG allows adding one marker at a time, whereas Supercluster has a single .load(...) method, after which there's no option to add more.
And having noticed that I can see clear that Mykomap is loading one marker at a time - which is presumably a slow way of using MCG. (If I recall right, that was also suggested by the profiling, which pointed to calls to the .addLayer() method of MCG as being a major percentage of the load time) So maybe we'd get better performance by adding all the markers once after the load. However, that's not a simple substitution refactor - it requires reworking the sequence of the data loading.
Furthermore, I can see that MM adds and removes markers from two MCG instances willy-nilly - one is for visible markers and one is for hidden markers. It does this when the filters or search changes the visible pins.
This is not going to work for Supercluster, as once created its index is immutable. So some other data structure will need to be used for tracking what's visible, and a Supercluster index created from that. Again, not a simple substitution refactor.
Profile of 50k pins. Probably too many other tabs open however: https://share.firefox.dev/3W31M6Z
Another one of 50k pins here, on my laptop with fewer other things going on. 100k pins kept crashing the profiler tab... https://share.firefox.dev/3zg6l4H
Serverside clustering in leaflet
@rogup found this article about Leaflet vs Mapbox: https://kuanbutts.com/2019/08/31/mbgl-vs-leaflet/
I did a bit of profiling in my work for #257 and found a couple of other low-hanging fruits
[comment in progress, will update later]
Investigating feasibilty of integrating supercluster, and inspecting the demo here:
https://github.com/AndrejGajdos/leaflet-markercluster-vs-supercluster/tree/main
If you run this demo, there's clearly a marked difference, in that the leaflet.markercluster implementation takes far longer than most people can tolerate looking at a blank screen, and the supercluster version loads nearly immediately (both use 500k pins).
But cutting down the pins to 200k so that the markercluster version loads, there are more differences. The markercluster version will re-cluster on zoom and panning, and will also spiderify co-incident markers when you click on them. The supercluster version does not - there is no implementation for this out of the box.
To use supercluster with leaflet really needs those functions implemented to reach parity.
So I ask myself the question: has anyone tried doing this before? Surely this is going to be the first thing people do when they attempt an integration.
Looking for answers, I find this, back in 2016, the Leaflet author commenting omn a request to integrate supercluster into leaflet.markercluster:
I believe it's very hard to integrate this into Leaflet.MarkerCluster — you get great performance at expense of not being able to animate clusters and add/remove points dynamically. https://github.com/mapbox/supercluster/issues/15
Then this says (also 2016):
Currently, if a point is added, removed, or moved, you have to recluster all the points, which can be expensive. [...] Hi. I'm considering Supercluster for my backend and I need to be able to add new points to the clusters. Is there a way to add points efficiently to the clusters without rebuilding it?
I'm expecting to have more than 500k points in my DB and I can't imagine loading them everytime a new one is added. Can you suggest an approach? [...] Supercluster uses kd bush to generate index, so it do not support dynamic indexing data https://github.com/mapbox/supercluster/issues/19
So I think a problem we will face is that when we do zoom and pan, we need to rebuild the clustering index, and this could be slow. Possibly we could cache all the 18 or so zoom levels? Possibly the rebuild will be fast enough?
The next question is, given that we decide to use supercluster, how do we hook it into leaflet?
This issue question suggests on how to pan with supercluster in leaflet points to a problem, although not one I think we'll have as we don't ever move markers. https://github.com/mapbox/supercluster/issues/164
But it has a minimal working example in the referred stackoverflow question here: https://stackoverflow.com/questions/63056141/leaflet-supercluster-l-markers-autopan-not-working
This is an article on using supercluster with Leaflet, and has some more examples. https://www.leighhalliday.com/leaflet-clustering
To conclude for now: seems it's being done, but I've not (yet) found anything fully packaged, which is curious.
On the topic of server side rendering with MarkerCluster:
Integration with a server-side clustering implementation [...] This is definitely needed. [...but...] This is sounding more like a fork of the markercluster package the more I think about it. [...] [leaflet.markercluster author] I'm going to say this is outside of the scope of L.MarkerCluster. Would love to see an implementation of it however! https://github.com/Leaflet/Leaflet.markercluster/issues/151
[From Element] One line of investigation I'm following is:
The premise being: MykoMap has a metric f-ton of baggage which makes it hard to see how well the components could do without all that.
Also tentatively: if you go much over 500 megapins, this gets unusable even with supercluster. [Later: I get better results with fewer windows and tabs open - perhaps unsurprisingly! I think 2 megapins worked okay]
Adapting the demos from Andrej Gajdos' article on "A Leaflet Developer's Guide to Hight-Performance ....", which are here, I've:
The spiderifying is adapted from the markercluster source, and is just a proof of concept (no legs on the spider, and the demo's unusually large clusters makes them in to big spirals).
The panning and zooming drops all the pins and re-adds them from the supercluster index query.
Yet for me, on a not very new desktop (2012 vintage) it loads in a small number of seconds, so long as I don't have my usual 20+ windows and 50+ tabs. (It can get much much slower if memory is constrained.) and panning and zooming is acceptably fast. An initial zoom sometimes seems to freeze for a few seconds - a profile suggests this is down to a massive garbage collect. Although I closed the profile and can't replicate this again for the time being...
So this does seem to suggest that lots of pins might be feasible in leaflet using supercluster. But it's very stripped down compared to Mykomap:
Searching millions of items might be slow too, but perhaps could be optimised to be fast, as the supercluster index queries are. This remains to be seen.
Source code at: https://github.com/DigitalCommons/leaflet-markercluster-vs-supercluster
Profile runs of supecluster demo. with markers (search for 'supercluster') showing the periods of instantiation, load, and query of the supercluster index. Instantiation and query are negligibly quick, <100ms, loading the index ~2 seconds.
Track this issue under Cooperative World Map Project
Description
To better understand areas for improving MM efficiency, we agreed in the last group MM call that we needed to do some profiling and analysis to identify slow/expensive components and processes. Now that we have some MM's created in #254 let's use these to profile.
Both @rogup and @wu-lee will have great ideas, so a good first step on this will be for the two of you to have a chat and come up with a systematic approach.
The following notes are taken from the last call. Including here as inspiration for what might be included in the profiling :)
Acceptance Criteria