[CWM] Profile MM existing performance using new test maps

lin-d-hop commented 3 months ago

Track this issue under Cooperative World Map Project

Description

To better understand areas for improving MM efficiency, we agreed in the last group MM call that we needed to do some profiling and analysis to identify slow/expensive components and processes. Now that we have some MM's created in #254 let's use these to profile.

Both @rogup and @wu-lee will have great ideas, so a good first step on this will be for the two of you to have a chat and come up with a systematic approach.

The following notes are taken from the last call. Including here as inspiration for what might be included in the profiling :)

How long does loading this much data take right now
Focus on the things that we dont know eg leaflet, components not related to data loading
Use another (faster) data exchange format - e.g. JSON, BSON, compressed versions thereof
Avoid using Javascript Objects and strings to represent organisations - use arrays / integers (using Typescript to ease usage)
Drop library components such as D3 if they can be dispensed with
Cut out all the UI except for Leaflet map (or possibly Mapbox?), add in just what we need for CWM as necessary (aiming to massively simplify the code)
Measure user's bandwidth and decide on appropriate amount of data to load to the front-end

Acceptance Criteria

@wu-lee and @rogup to meet and create a plan for systematically profiling MM performance, potentially with different test maps to compare at different scales.
Document the plan on this issue
@wu-lee then to go through and conduct the profiling as per the plan
Document raw results on this issue. The ideal will be to have an much information as possible
Undoubtedly recommendations will come, but more importantly is having as much information as we can ready for the in person meeting June 24-25

wu-lee commented 3 months ago

So I have done one bout of profiling already. See https://github.com/DigitalCommons/mykomap/issues/249#issuecomment-2154588535

Conclusion from that is mainly that a significant amount of the time goes adding pins to the map, even though they are clustered using the Leaflet.markercluster library, which reduces the amount of DOM twiddling that needs to do.

This article on optimising Leaflet suggests this could be speeded up by using the supercluster library instead. Which sounds like a good thing to try! But I would guess this will require a reasonable, possibly large amount of replumbing - knowing for sure requires just trying it.

On top of this, there is the work in #254. We noticed that the time to load the data itself is pretty quick - order of hundreds to thousands of milliseconds (AKA seconds). Then there's a long delay whilst the map's progress spinner twirls. Then the map appears. However, with 500k pins, the map is so slow to pan and zoom as to be unusable.

Which means, the clustering is not the final word - it's possible the map will be unusably slow even if we make the pin creation/clustering step fast. And there are currently no solutions for that, except perhaps more profiling and optimising. But if the slowness is entirely in Leaflet, we may need to try something like WebGL tiles or something more radically different.

wu-lee commented 3 months ago

I'm having a bash at wiring Supercluster in place of MarkerClusterGroup.

Supercluster's API is quite a lot simpler than MCG's, but this means some things don't map obviously (e.g. no way to trigger a pan and zoom to a marker, and spiderifying pins seems not to be an option)

Another difference is that MCG allows adding one marker at a time, whereas Supercluster has a single .load(...) method, after which there's no option to add more.

And having noticed that I can see clear that Mykomap is loading one marker at a time - which is presumably a slow way of using MCG. (If I recall right, that was also suggested by the profiling, which pointed to calls to the .addLayer() method of MCG as being a major percentage of the load time) So maybe we'd get better performance by adding all the markers once after the load. However, that's not a simple substitution refactor - it requires reworking the sequence of the data loading.

wu-lee commented 3 months ago

Furthermore, I can see that MM adds and removes markers from two MCG instances willy-nilly - one is for visible markers and one is for hidden markers. It does this when the filters or search changes the visible pins.

This is not going to work for Supercluster, as once created its index is immutable. So some other data structure will need to be used for tracking what's visible, and a Supercluster index created from that. Again, not a simple substitution refactor.

wu-lee commented 3 months ago

Profile of 50k pins. Probably too many other tabs open however: https://share.firefox.dev/3W31M6Z

wu-lee commented 3 months ago

Another one of 50k pins here, on my laptop with fewer other things going on. 100k pins kept crashing the profiler tab... https://share.firefox.dev/3zg6l4H

ms0ur1s commented 3 months ago

Serverside clustering in leaflet

https://alfiankan.medium.com/handle-millions-of-location-points-with-leaflet-without-breaking-the-browser-f69709a50861

wu-lee commented 3 months ago

@rogup found this article about Leaflet vs Mapbox: https://kuanbutts.com/2019/08/31/mbgl-vs-leaflet/

rogup commented 3 months ago

I did a bit of profiling in my work for #257 and found a couple of other low-hanging fruits

wu-lee commented 3 months ago

[comment in progress, will update later]

Investigating feasibilty of integrating supercluster, and inspecting the demo here:

https://github.com/AndrejGajdos/leaflet-markercluster-vs-supercluster/tree/main

If you run this demo, there's clearly a marked difference, in that the leaflet.markercluster implementation takes far longer than most people can tolerate looking at a blank screen, and the supercluster version loads nearly immediately (both use 500k pins).

But cutting down the pins to 200k so that the markercluster version loads, there are more differences. The markercluster version will re-cluster on zoom and panning, and will also spiderify co-incident markers when you click on them. The supercluster version does not - there is no implementation for this out of the box.

To use supercluster with leaflet really needs those functions implemented to reach parity.

So I ask myself the question: has anyone tried doing this before? Surely this is going to be the first thing people do when they attempt an integration.

Looking for answers, I find this, back in 2016, the Leaflet author commenting omn a request to integrate supercluster into leaflet.markercluster:

I believe it's very hard to integrate this into Leaflet.MarkerCluster — you get great performance at expense of not being able to animate clusters and add/remove points dynamically. https://github.com/mapbox/supercluster/issues/15

Then this says (also 2016):

Currently, if a point is added, removed, or moved, you have to recluster all the points, which can be expensive. [...] Hi. I'm considering Supercluster for my backend and I need to be able to add new points to the clusters. Is there a way to add points efficiently to the clusters without rebuilding it?

I'm expecting to have more than 500k points in my DB and I can't imagine loading them everytime a new one is added. Can you suggest an approach? [...] Supercluster uses kd bush to generate index, so it do not support dynamic indexing data https://github.com/mapbox/supercluster/issues/19

So I think a problem we will face is that when we do zoom and pan, we need to rebuild the clustering index, and this could be slow. Possibly we could cache all the 18 or so zoom levels? Possibly the rebuild will be fast enough?

The next question is, given that we decide to use supercluster, how do we hook it into leaflet?

This issue question suggests on how to pan with supercluster in leaflet points to a problem, although not one I think we'll have as we don't ever move markers. https://github.com/mapbox/supercluster/issues/164

But it has a minimal working example in the referred stackoverflow question here: https://stackoverflow.com/questions/63056141/leaflet-supercluster-l-markers-autopan-not-working

This is an article on using supercluster with Leaflet, and has some more examples. https://www.leighhalliday.com/leaflet-clustering

To conclude for now: seems it's being done, but I've not (yet) found anything fully packaged, which is curious.

On the topic of server side rendering with MarkerCluster:

Integration with a server-side clustering implementation [...] This is definitely needed. [...but...] This is sounding more like a fork of the markercluster package the more I think about it. [...] [leaflet.markercluster author] I'm going to say this is outside of the scope of L.MarkerCluster. Would love to see an implementation of it however! https://github.com/Leaflet/Leaflet.markercluster/issues/151

wu-lee commented 3 months ago

[From Element] One line of investigation I'm following is:

supercluster has been around for nearly a decade, is there any prior art we can steal (tentative answer, maybe a little, but no one seems to have melded supercluster and leaflet in nice package, either ominously (it's too hard) or suspiciously (MapBox won't allow it); perhaps a third option is: it creates bad tradeoffs for smaller datasets?)
can bare-bones supercluster be hooked up with a leaflet demo to pan, zoom, spiderify adequately with 500 megapins without too much effort? (tentative answer, maybe yes, see below.)
after which might be, can a similar mapbox demo do better?

The premise being: MykoMap has a metric f-ton of baggage which makes it hard to see how well the components could do without all that.

Also tentatively: if you go much over 500 megapins, this gets unusable even with supercluster. [Later: I get better results with fewer windows and tabs open - perhaps unsurprisingly! I think 2 megapins worked okay]

wu-lee commented 3 months ago

Adapting the demos from Andrej Gajdos' article on "A Leaflet Developer's Guide to Hight-Performance ....", which are here, I've:

Reduced the pins in the leaflet.markercluster version from 500k until it actually loads in a reasonable time - shows the functionality which comes with this package that doesn't with supercluster (reclustering is done on panning and zooming; co-incident pins in a top-level cluster get spiderified on clicking; Voronoi boundaries are shown around clusters on hover). See https://jsfiddle.net/waf9og2s/
Implemented these missing things (except Voronoi, which we probably don't care about) in the supercluster demo, keeping 500k pins. See https://jsfiddle.net/edno3pyw/

The spiderifying is adapted from the markercluster source, and is just a proof of concept (no legs on the spider, and the demo's unusually large clusters makes them in to big spirals).

The panning and zooming drops all the pins and re-adds them from the supercluster index query.

Yet for me, on a not very new desktop (2012 vintage) it loads in a small number of seconds, so long as I don't have my usual 20+ windows and 50+ tabs. (It can get much much slower if memory is constrained.) and panning and zooming is acceptably fast. An initial zoom sometimes seems to freeze for a few seconds - a profile suggests this is down to a massive garbage collect. Although I closed the profile and can't replicate this again for the time being...

So this does seem to suggest that lots of pins might be feasible in leaflet using supercluster. But it's very stripped down compared to Mykomap:

data is generated on the fly, not loaded
data is much simpler, just points
no searching or filtering is done

Searching millions of items might be slow too, but perhaps could be optimised to be fast, as the supercluster index queries are. This remains to be seen.

Source code at: https://github.com/DigitalCommons/leaflet-markercluster-vs-supercluster

Profile runs of supecluster demo. with markers (search for 'supercluster') showing the periods of instantiation, load, and query of the supercluster index. Instantiation and query are negligibly quick, <100ms, loading the index ~2 seconds.

1 megapins: https://share.firefox.dev/3XKFPL5
2 megapins: https://share.firefox.dev/4btNqRp

DigitalCommons / mykomap

[CWM] Profile MM existing performance using new test maps #256

Description

Acceptance Criteria