georust / geozero

Zero-Copy reading and writing of geospatial data.
Apache License 2.0
321 stars 30 forks source link

Remove Arrow mod, point to geoarrow crate #186

Closed kylebarron closed 4 months ago

kylebarron commented 5 months ago

This is marked as WIP until 0.1 release of the geoarrow crate, expected to be this week (I need arrow-rs to release a new version).

michaelkirk commented 5 months ago

which implements geozero conversions to and from each of the array types.

In light of #187, do you think it's desirable to have the geozero integrations live externally? TBH I haven't really figured out which way is "least worst".

kylebarron commented 5 months ago

I agree it's a difficult question. My first thought is: the dependency should go "from the crate that is changing more to the crate that is changing less". Right now, geoarrow is changing heavily while geozero is relatively stable, so it seems like having the geozero implementation in the geoarrow crate makes for easier and faster development iteration.

In relation to #187 and shapefile, maybe it's the other way around; if the core part of parsing shapefiles is more stable than geozero's release cycle, then maybe geozero itself should depend on geozero-shp's public APIs?

Just my initial thoughts; what do you think?

michaelkirk commented 5 months ago

from the crate that is changing more to the crate that is changing less

It's unfortunately not a very objective or stable criteria to apply, but I do think it's a pragmatic approach. 👍

One small thing to clarify: I don't think we care about non-breaking changes, only semver incompatible changes that can complicate cargo's dependency resolution.

In that sense, the shapefile code might have a better chance of being stable, just by virtue of the format itself being stable. But I suppose that's not necessarily so.

kylebarron commented 5 months ago

It's unfortunately not a very objective or stable criteria to apply, but I do think it's a pragmatic approach. 👍

I agree it's not objective, but it feels like Rust's strict version resolution pushes towards this in practice.

With geoarrow specifically, the geozero bindings are pretty significant in size. This entire directory is just for geozero bindings. It seems like it would be more invasive to add all of that to geozero and for maintenance to happen here.

Edit: one more note is that geoarrow's geozero bindings are implemented using geo-traits (e.g. here), which isn't yet stable. Maybe in a future where geo-traits become stable, geozero can have a source that reads from Iterator<Item = impl GeometryTrait<T=f64>>, and then that can decouple a bit from geoarrow

kylebarron commented 5 months ago

FWIW: I published v0.1 of geoarrow and its geozero integration is here: https://docs.rs/geoarrow/0.1.0/geoarrow/io/geozero/index.html

pka commented 4 months ago

I'm +1 for merging. If there is an external arrow based implementation, we can drop the arrow2 based implementation.

kylebarron commented 4 months ago

Is this good to merge?

michaelkirk commented 4 months ago

Just to make sure I understand, the change notes for this might read like:

Moved the geozero/geoarrow integration from the geozero crate to the geoarrow crate

And the motivation for the change is because the geo-arrow crate is undergoing more frequent breaking changes than geozero (currently anyway).

If so, it's all fine with me.

kylebarron commented 4 months ago

Moved the geozero/geoarrow integration from the geozero crate to the geoarrow crate

And the motivation for the change is because the geo-arrow crate is undergoing more frequent breaking changes than geozero (currently anyway).

Moved, yes, but more importantly updated and expanded the geozero/geoarrow integration:

The geoarrow crate is indeed likely to change more frequently; I hope to support 3D and 4D coordinates (even if they can't be interop'ed with geo-types) which would likely incur a bunch of breaking changes.

I am considering breaking geoarrow into sub-crates, where the geozero integration would live in geoarrow-io. Maybe at that point the geoarrow-io sub-crate would be stable enough for geozero to depend on it 🤷 .

Another possible consideration for "in which repo should code live" is the current flatgeobuf situation where IIUC there's a circular dependency between the flatgeobuf and geozero crates which makes it hard to make new releases? I'm not sure the right

kylebarron commented 4 months ago

It's my first time trying the merge queue and I'm excited!

michaelkirk commented 4 months ago

Thanks for the explanation - I feel like you've explained it before, but it's kind of complicated for someone not steeped in the day to day of arrow, so I appreciate the recap. 👍