geoarrow / geoarrow-rs

GeoArrow in Rust, Python, and JavaScript (WebAssembly) with vectorized geometry operations
http://geoarrow.org/geoarrow-rs/
Apache License 2.0
264 stars 18 forks source link

Dying in specialization & generics #801

Open kylebarron opened 2 months ago

kylebarron commented 2 months ago

I'm currently dying in generics.

It's making it harder to move fast, and gives a ton of development overhead. If there's one lesson I should learn from DuckDB Spatial, it's that I'm not moving fast enough.

Any algorithms that return float values, such as area, should only take in a &dyn NativeArray + NativeArrayAccessor<2>.

So we want something like an NativeArrayAccessor<const D: usize>, so that we can implement algorithms on

pub fn intersection(lhs: &dyn NativeArray + NativeArrayAccessor<2>, rhs: &dyn NativeArray + NativeArrayAccessor<2>) -> GeometryArray

This also means we can greatly simplify our IndexedArray implementation too. We should switch to IndexedNativeArray (or maybe IndexedNativeArray<const D: usize>) which holds only an `Arc<>

Also, anything like area that is implemented on geo::Geometry (or geos::Geometry) should be implemented solely on &dyn NativeArray + NativeArrayAccessor<2> (or maybe allow 3d as well there) and not on every array type.

The other issue here is that the Geometry scalar is generic over both dimension and the offset size of the underlying NativeArray. This is especially painful as the dimension matters for the Geometry but the offset size does not. It's only an artifact of the underlying storage.

I'm getting closer to removing support internally for i64 offset arrays. We can support ingesting that data but only represent i32 data internally. To max out i32 offsets, we'd need to have 2^31 + 1 (2,147,483,649) coordinates, which would be 34,359,738,384 bytes, or 32GB.

Especially the binary operations are a total disaster right now. This file is 750 lines of code just to implement intersections, when we aren't even doing any of the core implementation ourselves! This should be 10 lines of code, to take in a &dyn NativeArray + NativeArrayAccessor<2>, iterate over its geoarrow::scalar::Geometry<2> objects, and call intersects on each one.

kylebarron commented 2 months ago

https://github.com/geoarrow/geoarrow-rs/pull/803 removed i64 support. So we're making progress towards simplification