Open asinghvi17 opened 2 days ago
Yes this will be most of the performance improvement we need for clipping
Here's an example of how big of difference can be, ogr2ogr takes 3 seconds to clip and reproject, I gave up on timing GeometryOps :
using Shapefile
import GeometryOps as GO
import GeoInterface as GI
import GeoFormatTypes as GFT
using DataFrames
using GDAL
# link to Shapefile used in this issue
# https://drive.google.com/file/d/1NeKbo_tiaiyAnd15QLN2mqzt-oPuWan3/view?usp=sharing
# update once you've downloaded the file locally
path2shp_local = "/mnt/bylot-r3/data/vector_files/not_ocean.shp"
# define some helper functions
function clip_or_empty(polygon, clipper)
if GO.contains(polygon, clipper) # early return if polygon is completely inside clipper
return GO.tuples(polygon)::Union{GI.Polygon,GI.MultiPolygon}
end
if GO.disjoint(polygon, clipper) # early return if polygon is completely outside clipper
null_point = GI.is3d(polygon) ? (GI.ismeasured(polygon) ? (NaN, NaN, NaN, NaN) : (NaN, NaN, NaN)) : (NaN, NaN)
contents = [GI.LinearRing([null_point, null_point, null_point])]
return GO.tuples(GI.Polygon{GI.is3d(polygon),GI.ismeasured(polygon),typeof(contents),Nothing, typeof(GI.crs(polygon))}(contents, nothing, GI.crs(polygon)))
end
result = GO.intersection(polygon, clipper; target = GI.PolygonTrait())
if isempty(result)
null_point = GI.is3d(polygon) ? (GI.ismeasured(polygon) ? (NaN, NaN, NaN, NaN) : (NaN, NaN, NaN)) : (NaN, NaN)
contents = [GI.LinearRing([null_point, null_point, null_point])]
return GO.tuples(GI.Polygon{GI.is3d(polygon),GI.ismeasured(polygon),typeof(contents),Nothing, typeof(GI.crs(polygon))}(contents, nothing, GI.crs(polygon)))
else
return (GI.MultiPolygon(result; crs = GI.crs(polygon)))
end
# return GO.intersection(GO.GEOS(), polygon, clipper)
end
function bounds2rectangle(xbounds, ybounds)
rectangle = GI.Wrappers.Polygon([[(xbounds[1], ybounds[1]), (xbounds[1], ybounds[2]), (xbounds[2], ybounds[2]), (xbounds[2], ybounds[1]), (xbounds[1], ybounds[1])]])
return rectangle
end
path2shp_local = "/mnt/bylot-r3/data/vector_files/not_ocean.shp"
shpfile_out = replace(path2shp_local, ".shp" => "_nh.shp")
xmin = -179.99940500651084
ymin = -65.51457469986777
xmax = 179.98590403156888
ymax = -51.16025312702513
clipper = bounds2rectangle((xmin, xmax), (ymin, ymax))
@time GDAL.ogr2ogr_path() do ogr2ogr # 3s
run(`$ogr2ogr -clipsrc $xmin $ymin $xmax $ymax -t_srs EPSG:3031 $shpfile_out $path2shp_local`)
feature = DataFrame(Shapefile.Table(shpfile_out));
end
@time begin # this takes too long to time
feature = DataFrame(Shapefile.Table(path2shp_local));
# clipping along equator is MUCH faster than clipping box
feature.geometry = clip_or_empty.(feature.geometry, (clipper,));
feature = feature[.!isnothing.(feature.geometry),:];
feature.geometry = GO.reproject(feature.geometry; target_crs = GFT.EPSG(3031), source_crs = GFT.EPSG(4326));
end;
It occurs to me that we don't actually have these yet, would be good to do. They are implemented for DE-9IM ops (contains, within, disjoint, intersects, etc) but not for intersection, difference, and union.