Closed kylebarron closed 4 months ago
which implements geozero conversions to and from each of the array types.
In light of #187, do you think it's desirable to have the geozero integrations live externally? TBH I haven't really figured out which way is "least worst".
I agree it's a difficult question. My first thought is: the dependency should go "from the crate that is changing more to the crate that is changing less". Right now, geoarrow
is changing heavily while geozero
is relatively stable, so it seems like having the geozero implementation in the geoarrow
crate makes for easier and faster development iteration.
In relation to #187 and shapefile, maybe it's the other way around; if the core part of parsing shapefiles is more stable than geozero's release cycle, then maybe geozero itself should depend on geozero-shp
's public APIs?
Just my initial thoughts; what do you think?
from the crate that is changing more to the crate that is changing less
It's unfortunately not a very objective or stable criteria to apply, but I do think it's a pragmatic approach. 👍
One small thing to clarify: I don't think we care about non-breaking changes, only semver incompatible changes that can complicate cargo's dependency resolution.
In that sense, the shapefile code might have a better chance of being stable, just by virtue of the format itself being stable. But I suppose that's not necessarily so.
It's unfortunately not a very objective or stable criteria to apply, but I do think it's a pragmatic approach. 👍
I agree it's not objective, but it feels like Rust's strict version resolution pushes towards this in practice.
With geoarrow specifically, the geozero bindings are pretty significant in size. This entire directory is just for geozero bindings. It seems like it would be more invasive to add all of that to geozero and for maintenance to happen here.
Edit: one more note is that geoarrow's geozero bindings are implemented using geo-traits (e.g. here), which isn't yet stable. Maybe in a future where geo-traits become stable, geozero can have a source that reads from Iterator<Item = impl GeometryTrait<T=f64>>
, and then that can decouple a bit from geoarrow
FWIW: I published v0.1 of geoarrow
and its geozero integration is here: https://docs.rs/geoarrow/0.1.0/geoarrow/io/geozero/index.html
I'm +1 for merging. If there is an external arrow
based implementation, we can drop the arrow2
based implementation.
Is this good to merge?
Just to make sure I understand, the change notes for this might read like:
Moved the geozero/geoarrow integration from the geozero crate to the geoarrow crate
And the motivation for the change is because the geo-arrow crate is undergoing more frequent breaking changes than geozero (currently anyway).
If so, it's all fine with me.
Moved the geozero/geoarrow integration from the geozero crate to the geoarrow crate
And the motivation for the change is because the geo-arrow crate is undergoing more frequent breaking changes than geozero (currently anyway).
Moved, yes, but more importantly updated and expanded the geozero/geoarrow integration:
arrow
and arrow2
. geozero's existing integration used arrow2
, which is now defunct. The geoarrow
crate uses the maintained arrow
crate.O(N)
to decode; native encodings are always O(1)
access to any coordinate. The existing geozero code supported only a tiny fraction of that: read-only from the WKB serialized encoding. The current geoarrow
crate supports read-write from/to a GeoArrow WKB array but also to/from all the native encodings.The geoarrow
crate is indeed likely to change more frequently; I hope to support 3D and 4D coordinates (even if they can't be interop'ed with geo-types
) which would likely incur a bunch of breaking changes.
I am considering breaking geoarrow
into sub-crates, where the geozero integration would live in geoarrow-io
. Maybe at that point the geoarrow-io
sub-crate would be stable enough for geozero to depend on it 🤷 .
Another possible consideration for "in which repo should code live" is the current flatgeobuf situation where IIUC there's a circular dependency between the flatgeobuf and geozero crates which makes it hard to make new releases? I'm not sure the right
It's my first time trying the merge queue and I'm excited!
Thanks for the explanation - I feel like you've explained it before, but it's kind of complicated for someone not steeped in the day to day of arrow, so I appreciate the recap. 👍
This is marked as WIP until 0.1 release of the geoarrow crate, expected to be this week (I need arrow-rs to release a new version).
arrow
mod in favor of the https://github.com/geoarrow/geoarrow-rs crate, which implements geozero conversions to and from each of the array types.