Open emaildipen opened 3 hours ago
Please post the code you have used, not only its description.
def fill_holes(geometry, min_hole_size):
"""
Fill holes in a geometry (Polygon or MultiPolygon) if they are smaller than min_hole_size.
"""
if geometry.geom_type == 'Polygon':
if geometry.interiors:
new_interiors = [interior for interior in geometry.interiors if Polygon(interior).area >= min_hole_size]
return Polygon(geometry.exterior, new_interiors)
else:
return geometry
elif geometry.geom_type == 'MultiPolygon':
return unary_union([fill_holes(poly, min_hole_size) for poly in geometry])
else:
return geometry
# Apply fill_holes function in parallel
filled = ddf.map_partitions(lambda ddf: ddf.geometry.apply(lambda geom: fill_holes(geom, min_hole_size)))
filled_ser=filled.compute()
I have a Dask GeoDataFrame, from which I extracted the geometry and performed infill using Shapely. I used geometry.interiors to set an area threshold and fill the holes. After that, I created a new geometry DataFrame. However, I don’t understand why it takes so long when I try to convert the Dask GeoSeries into a GeoSeries. Whenever I use the .compute() command, it takes ages—more than 12 hours. I thought something might be wrong with my approach.