Open emaildipen opened 4 weeks ago
Do you need dask-geopandas? Because if you are fine with vanilla geopandas, it will be much easier. And 200k should be perfectly fine.
You need to identify connected components and dissolve by a component label. That is tricky in distributed setting. But in a single GeoDataFrame, it is easy with the help of libpysal / (or scipy only).
from libpysal import graph
comp_label = graph.Graph.build_contiguity(gdf, rook=False).component_labels
gdf.dissolve(comp_label)
If you know that you have a correct polygonal coverage, you can even use much faster coverage union.
gdf.dissolve(comp_label, method="coverage")
Thanks! Yes, I do need Dask since I’ll be processing millions of polygons. I added map_partitions to my function, and it worked. However, now the problem is that it’s taking a long time to transfer it to a GeoPandas DataFrame.
map_partitions
will work only if you ensure that a single component is always within a single partition. If it stretches across multiple, the approach will not work.
I have around 200k polygons in a shapefile, and I want to dissolve the polygons that are connected to each other. ArcGIS offers simple techniques to achieve this, but I was wondering if there are quicker ways to do it. I’ve tried the following but it took ages to execute.