Open patsaylor opened 2 years ago
Can you provide a copy of the dataset, a copy of the code, and the output of pip list
please?
Hello! thanks for looking into the issue-- here gis_pip_list.txt is the sample data, and pip file:
Sorry ^ above comments got cut off. please find the files attached, and thanks for looking into the issue!
The sample data you provided is in some unknown non-CSV format:
Can you reupload in CSV format please?
Also, can you provide the code please? In order to look into this, I need a minimum reproducible example.
geoplot_sample.zip Hello, please let me know if this will work - included: input file run script sample output figure
thanks again for looking into this!
Alright, I was able to repro.
The underlying issue is an interesting one. Latitudes and longitudes are on a continuous [-180, 180]
axis. So for example the next degree after you reach 180 degrees longitude is -179 degrees longitude. The kernel density estimation algorithm is naive to this; it expects a smooth continuous axis from (-inf, +inf)
in both directions.
Because your data straddles the longitudinal boundary (the international date line), the KDE algorithm is generating kernel density boundaries that (1) don't connect at the boundary and (2) extend past the maximum extent of the coordinate system, e.g. past 180 degrees and -180 degrees. If you remove the projection, this becomes obvious:
You are using a projection, and so cartopy
appears to "wrap" the coordinate values that are past the maximum coordinate value back to the coordinate grid. I think it just translates e.g. 182 to -2 degrees, 185 to -5 degrees, etcetera. This is causing the boundary lines to overlap themselves, as you are seeing here.
So that explains what's going wrong, now how to solve it? I'm not sure actually. Fixing this boundary issue would require writing a custom kernel density estimator kernel, which seaborn
actually (surprisingly, IMO) removed support for recently (see here). I might just have to put a disclaimer in the documentation telling people the plot won't work for data straddling coordinate boundaries.
Now that I think about it, I think all of the other analytical plot types (specifically voronoi
and quadtree
) share this problem. 😓
Thanks for the thorough investigation and clear explanation of the issue. Thinking about steps forward-- Did you happen to look into separating the analysis step from projection? Perhaps converting the longitude coordinates to 0-360 prior to KDE analysis, and then reprojecting for mapping?
Not sure, but maybe.
Using the kdeplot, with only one dataset, it appears that isolines are overlapping instead of merging, producing two local maxima instead of a consistent isoline around all features.