Open RaczeQ opened 3 months ago
If you can access the indices of each geometry type, then you can do something like in https://github.com/developmentseed/lonboard/issues/491 with pyarrow.Table.take
instead of DataFrame.iloc
You're definitely right that something like geoarrow.pyarrow.geometry_type(x)
(returning something the same length as x) would be a very helpful compute function for a lot of reasons. It's possible to do this using purely pyarrow compute, although we clearly have the ability to do this more efficiently/generically (since we can compute the unique geometry types), it's just not wired up yet.
import geoarrow.pyarrow as ga
import pyarrow as pa
import pyarrow.compute as pc
wkbs = ga.as_wkb(["POINT (0 1)", "LINESTRING Z (0 0 1, 1 1 2)", "MULTIPOINT (0 0, 1 1)"])
# Doesn't work with nulls
assert wkbs.null_count == 0
# Only works with little-endian WKB
endian_byte = pc.binary_slice(wkbs.storage, 0, 1)
endian = pa.Array.from_buffers(pa.int8(), len(endian_byte), [endian_byte.buffers()[0], endian_byte.buffers()[2]])
assert pc.all(pc.equal(endian, 1)).as_py()
wkb_type_bytes = pc.binary_slice(wkbs.storage, 1, 5)
geometry_type = pa.Array.from_buffers(pa.uint32(), len(wkb_type_bytes), [wkb_type_bytes.buffers()[0], wkb_type_bytes.buffers()[2]])
# Might have to do some extra work if you are expecting ZM WKB
one_thousand = pa.scalar(1000, pa.uint32())
geometry_type = pc.subtract(geometry_type, pc.multiply(pc.divide(geometry_type, one_thousand), one_thousand))
# If you're expecting EWKB you might also have to mask off the high bits
mask = pa.scalar(0x00FFFFFF, pa.uint32())
geometry_type = pc.bit_wise_and(geometry_type, mask)
geometry_type
#> <pyarrow.lib.UInt32Array object at 0x1135c5de0>
#> [
#> 1,
#> 2,
#> 4
#> ]
Hi, I'm wondering if it would be possible to have a
WkbType
column and filter out geometries based on a given type (Point, LineString, Polygon etc). There are some compute functions available, there even isunique_geometry_types
, but I'm not sure if any of those could help me in my use case.