emilhe / dash-leaflet

MIT License
213 stars 37 forks source link

Performance issues with dl.MarkerClusterGroup (I think) #168

Closed perfectly-preserved-pie closed 1 year ago

perfectly-preserved-pie commented 1 year ago

hi, I've made a Dash Leaflet app here. The whole source code (including the dataset, stored as a Python pickle) is there if you wanna take a look.

My host is

so I think I can safely rule out hardware causes.

Basically, when the Dash callback fires because the user has adjusted an option, the resulting new markers on the map are generated SO slowly. Try it out yourself: https://wheretolive.LA

Adjust any of the options on the left (move some sliders, tick or untick some checkboxes, etc.). You'll see that the markers don't actually update until 2-7 seconds later. For example, changing the Price (Monthly) slider range from 1100-2000 to 1100-5000 took 3.4 seconds for me.

I have a feeling this piece of code in the Dash callback is the bottleneck:

# Create markers & associated popups from dataframe
  markers = [dl.Marker(children=dl.Popup((row.popup_html), closeButton=True), position=[row.Latitude, row.Longitude]) for row in df_filtered[df_filtered.Latitude.notnull()].itertuples()]

...
  # Generate the map
  return dl.MarkerClusterGroup(id=str(uuid.uuid4()), children=markers)

Specifically, dl.MarkerClusterGroup(id=str(uuid.uuid4()), children=markers) is the slow part I think.

The Pandas dataframe itself is about 1,200 rows and 17? columns. I don't ever see it exceeding more than 10,000 rows and 20 columns. It's pretty small in the grand scheme of things. So I don't necessarily suspect this is a Pandas query performance issue.

I also did some (admittedly basic) execution timing tests on each of the Pandas operations that make up df_filtered and they don't amount to much even put all together. The queries execute in a few milliseconds or nanoseconds.

I've seen what other people have done with Dash & Dash Leaflet, and I've seen their datasets. They're massive, much bigger than my paltry 1200 rows, and yet their apps perform well and are fast.

What am I missing here? How can I make this faster? :(

emilhe commented 1 year ago

The marker cluster group component is only suited for small datasets (say, 100 rows). For larger datasets, the GeoJSON component should be used,

https://dash-leaflet-docs.onrender.com/#super_cluster

perfectly-preserved-pie commented 1 year ago

Oh ok, I had a feeling GeoJSON was the new hotness but it looked so confusing I didn't pursue it. From the example it looks like GeoJSON doesn't support things like Dash HTML; it seems like I can only feed it coordinates and nothing else. I just tried it myself and while the markers appear, they don't have any of the associated HTML on the popups. Am I supposed to use dl.Marker still?

def update_map(subtypes_chosen, pets_chosen, terms_chosen, garage_spaces, rental_price, bedrooms_chosen, bathrooms_chosen, sqft_chosen, years_chosen, sqft_missing_radio_choice, yrbuilt_missing_radio_choice, garage_missing_radio_choice, ppsqft_chosen, ppsqft_missing_radio_choice, furnished_choice, security_deposit_chosen, security_deposit_radio_choice, pet_deposit_chosen, pet_deposit_radio_choice, key_deposit_chosen, key_deposit_radio_choice, other_deposit_chosen, other_deposit_radio_choice, listed_date_datepicker_start, listed_date_datepicker_end, listed_date_radio):
...
# Create markers & associated popups from dataframe
  markers = [dl.Marker(children=dl.Popup((row.popup_html), closeButton=True, ), position=[row.Latitude, row.Longitude]) for row in df_filtered[df_filtered.Latitude.notnull()].itertuples()]

  # Generate the map
  return dl.GeoJSON(id=str(uuid.uuid4()),children=markers)

# Launch the Flask app
if __name__ == '__main__':
  app.run_server(debug=True)

My popups have HTML components like Imgs and table rows, etc; does that still work with GeoJSON? That's why I was using dl.Marker and MarkerClusterGroup. It was pretty easy to see how I could fit that in there. I'm kind of at a loss at how to do that with GeoJSON.

emilhe commented 1 year ago

Simple tooltips are support out of the box, just add a prop with the name 'tooltip' to the GeoJSON data, and it will show (you should not use the Marker components as children, you should pass the GeoJSON data via the data property),

https://dash-leaflet-docs.onrender.com/#tutorials

Rendering more complex stuff should be possible too, but you might need to implement a custom JS rendering function, depending on your specific needs.

BTW: For the MWE, I would recommend to upload sample data in JSON/CSV or a similar format. Loading a pickle file could execute potentially malicious code, so loading a such file from an unverified source, is not recommended.

perfectly-preserved-pie commented 1 year ago

Excellent, thank you. That example helped a lot. I also fed it into ChatGPT to have it give me some more examples that I could study. I learned that while I can't use Dash HTML components in the popup, the popup property does accept raw HTML. After converting my Dash HTML objects into raw HTML, I was able to render all 1200 points using dl.GeoJSON and have the popups styled to my liking; of course, it's also much faster now (almost instant when triggering the callback!) which is just amazing and totally solves the performance issue 🎉

I do have some other questions regarding clustering --

BTW: For the MWE, I would recommend to upload sample data in JSON/CSV or a similar format. Loading a pickle file could execute potentially malicious code, so loading a such file from an unverified source, is not recommended.

Ah, sorry, I'm still new to Python and wasn't aware of that. I'll upload a CSV to my repo.

emilhe commented 1 year ago

Great to hear that your performance issues have been resolved. Regarding your questions,

perfectly-preserved-pie commented 1 year ago

I guess the popups are closed because the marker/cluster view is re-rendered on pan/zoom. Hence, to achieve the behavior you want, changes to the dash-leaflet code are probably needed. I'll be happy to look at a PR, if you choose to go down this route

Bummer! Is there any other alternative for the re-rendering issue? I don't know how to write JavaScript so submitting a PR is out of my wheelhouse unfortunately.

It should be possible to achieve similarly sized clusters, and the superClusterOptions is indeed the property that you need to tune. The default clustering radius for the marker cluster component is 80 pixels. I am not sure the underlying clutering algorithms yield equivalent results, so you might need to do some tuning

Thank you! I started with 80 pixels, and tweaked it up a bit. You're right about it not being equivalent but after some trial and error I almost have it look the same as dl.MarkerClusterGroup, so I'll chalk that up to a success.

If you want to change the rendering of the cluster itself (e.g. adding a polygon), you can add a custom rendering function via the clusterToLayer property. For inspiration on what such a function looks like, you can take a look a the default implementation

Awesome, that's very helpful. Let's say I make my own custom function; how can I pass that into Dash Leaflet? Through a property in dl.GeoJSON or somewhere else?

perfectly-preserved-pie commented 1 year ago

Ok, I think I've gotten the custom JavaScript function nailed down. I still need to do more testing though.

Since the original issue has been resolved (slow performance) I'll close out this issue now and maybe make a new one regarding the custom function if I still can't get it to work later on. Thanks for all your help!!