WZBSocialScienceCenter / geovoronoi

a package to create and plot Voronoi regions within geographic boundaries
https://datascience.blog.wzb.eu/2018/03/16/creating-and-plotting-voronoi-regions-for-geographic-data-with-geovoronoi/
Apache License 2.0
135 stars 28 forks source link

Very big efficiency improvement #11

Open zegres opened 3 years ago

zegres commented 3 years ago

I have been using geovoronoi for a while. Recently I had to make voronoi polygons for 30K or more points and geovoronoi got way too slow for my purposes. I tried PyGEOS, which has a voronoi_polygons function that is actually faster than scipy's, but there is no way that I found to refer back to the original points without doing a spatial join. So I decided to try my hand at using scipy's scipy.spatial.Voronoi myself (like geovoronoi). I quickly ran into the same place where geovoronoi gets all its runtime and complexity - dealing with infinite ridges/polygons. Now, I had a big breakthrough. I just add four more points to the voronoi far away - this way the infinte polygons are all oustide the bounding geometry. Then every polygon I care about is finite and I just ignore the last 4. I get the polygons now in 300ms vs 5 seconds using geovoronoi.

I'll clean up my code and put it here, but the idea is simple.

example

internaut commented 3 years ago

Well, you may try but I'm a bit skeptical about such "hacks". They often enough only work in certain circumstances and break with edge cases. Make sure to run all tests and examples in the project.

zegres commented 3 years ago

Completely agree with you in principal, hacks like this can break with edge cases. And this method will introduce a small amount of error to the angles of infinite ridges. Three things in defense of this strategy:

internaut commented 3 years ago

You're of course very welcome to contribute this approach and I think it would be a great benefit if it works!

As I said, the tests in the project must pass and examples must run. If that's the case, I'm willing to consider to include it into the package.

At the moment, I think "phase 2" of the current algorithm is often slow as it tries to "fill up" even very tiny portions of polygons. One could also include an amount of "acceptable error" here, which would also make it faster. But that would be another approach -- maybe you can test that too if you like, I currently don't have the time for this. Would that reduce your computation time for your project?

The main problem I'm seeing with your approach is that it can more easily lead to numerical issues, e.g. when you have one cluster of points, but then one more point that is very far from the cluster. Your approach needs to add the four points on the containing circle with a certain radius as additional parameter (which is also a bit unfortunate since I'd like to use as few parameters as possible). If there's this one point very far away, the circle may have a very large radius which may cause numerical overflows. If you test your approach, please consider this and add some appropriate tests.