ResidentMario / geoplot

High-level geospatial data visualization library for Python.
https://residentmario.github.io/geoplot/index.html
MIT License
1.14k stars 95 forks source link

geoplot.kdeplot --> overlapping isolines #266

Open patsaylor opened 2 years ago

patsaylor commented 2 years ago

Using the kdeplot, with only one dataset, it appears that isolines are overlapping instead of merging, producing two local maxima instead of a consistent isoline around all features.

Screen Shot 2022-02-25 at 9 01 22 AM
ResidentMario commented 2 years ago

Can you provide a copy of the dataset, a copy of the code, and the output of pip list please?

patsaylor commented 2 years ago

geoplot_sample_data.csv

Hello! thanks for looking into the issue-- here gis_pip_list.txt is the sample data, and pip file:

patsaylor commented 2 years ago

Sorry ^ above comments got cut off. please find the files attached, and thanks for looking into the issue!

ResidentMario commented 2 years ago

The sample data you provided is in some unknown non-CSV format:

image

Can you reupload in CSV format please?

Also, can you provide the code please? In order to look into this, I need a minimum reproducible example.

patsaylor commented 2 years ago

geoplot_sample.zip Hello, please let me know if this will work - included: input file run script sample output figure

thanks again for looking into this!

ResidentMario commented 2 years ago

Alright, I was able to repro.

The underlying issue is an interesting one. Latitudes and longitudes are on a continuous [-180, 180] axis. So for example the next degree after you reach 180 degrees longitude is -179 degrees longitude. The kernel density estimation algorithm is naive to this; it expects a smooth continuous axis from (-inf, +inf) in both directions.

Because your data straddles the longitudinal boundary (the international date line), the KDE algorithm is generating kernel density boundaries that (1) don't connect at the boundary and (2) extend past the maximum extent of the coordinate system, e.g. past 180 degrees and -180 degrees. If you remove the projection, this becomes obvious:

image

You are using a projection, and so cartopy appears to "wrap" the coordinate values that are past the maximum coordinate value back to the coordinate grid. I think it just translates e.g. 182 to -2 degrees, 185 to -5 degrees, etcetera. This is causing the boundary lines to overlap themselves, as you are seeing here.

So that explains what's going wrong, now how to solve it? I'm not sure actually. Fixing this boundary issue would require writing a custom kernel density estimator kernel, which seaborn actually (surprisingly, IMO) removed support for recently (see here). I might just have to put a disclaimer in the documentation telling people the plot won't work for data straddling coordinate boundaries.

Now that I think about it, I think all of the other analytical plot types (specifically voronoi and quadtree) share this problem. 😓

patsaylor commented 2 years ago

Thanks for the thorough investigation and clear explanation of the issue. Thinking about steps forward-- Did you happen to look into separating the analysis step from projection? Perhaps converting the longitude coordinates to 0-360 prior to KDE analysis, and then reprojecting for mapping?

ResidentMario commented 2 years ago

Not sure, but maybe.