Open urschrei opened 6 months ago
I've thought about this too. It might be a good idea, but it would be a pretty big change, so I wanted to flesh out the pros and cons a bit more.
The axiom is that it's useful to provide a way for geospatial crates like https://github.com/tmontaigu/shapefile-rs/blob/master/Cargo.toml and https://github.com/georust/geojson to interop with eachother and the algorithms in geo
.
So I'm assuming we still see value in providing interop, but it would now live in geo
instead of geo_types
.
I see two primary benefits to maintaining the current separation of geo-types and geo (cons against merging).
Keeping geo-types
minimal encourages crate developers to integrate with it, without greatly increasing their own dependency chain (and thus their user's build times and binary sizes).
Mitigation proposal: Put all the algorithms behind a new --features="algorithms"
and have --no-default-features
build only the types. Encourage third party crate integrators to use --no-default-features
when possible.
Keeping geo-types relatively sem-ver stable keeps crate authors from having to spend too much time on the upgrade treadmill. If we merged the two, geo-types would break as often as geo.
For context, the last breaking release of geo-types was Jan 2021. We've had 11 breaking release of geo in that timeframe. That metric is a bit exaggerated. I think because we have geo-types separated, we are a bit fearless about breaking geo semver. We could be a bit more conservative if we saw value in it, but I think the point stands. (Personally, this is my biggest concern with merging the two.)
I've concluded that we have no evidence that there are users of geo-types who don't also use geo,
There are definitely plenty of libraries that use geo-types without using geo (geozero, geojson, wkt, shapefile-rs), but I think what you mean is, at the end of the day, users of those libraries are all (or mostly all) using geo somewhere else in their codebase, so there's no net reduction in dependencies for the end user. Is that right?
I'd wager that's true for most users. I'm not sure how to measure the significance of that remainder though.
Keep the separate crates, and add new type wrappers for geo_type::geometries into geo
. Then we could easily implement the traits (e.g. rstar) on the new types in geo with access to the algorithms. The problem though is we now have geo::Point and geo_types::Point which are not actually the same thing and will likely lead to hard to understand compilation errors.
We can continue to move shared algorithm code needed for third party trait integration from geo into geo_types::private_utils
, but it starts to sabotage the purported benefits of the separation.
Wild idea that would maintain the current low number of breaking releases for crates that only need the types for interop and don't care about the algorithms:
What if we:
algorithms
feature (enabled by default), such that cargo build geo --no-default-features
builds only the types (no algorithms). geo_types
crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.Then I think we could continue to freely break the semver for geo
, and even update geo-types
's geo
dependency to the latest geo
release, but we'd only need to actually break geo-types semver if we actually change the geometry format, since that's the only thing it re-exports.
I haven't thought about it too hard... but it seems like it'd address my concern about keeping a relatively stable interop format while allowing third party integrations to take advantage of algorithms.
Mitigation proposal: Put all the algorithms behind a new --features="algorithms" and have --no-default-features build only the types. Encourage third party crate integrators to use --no-default-features when possible.
This is a great idea!
For context, the last breaking release of geo-types was Jan 2021. We've had 11 breaking release of geo in that timeframe. That metric is a bit exaggerated. I think because we have geo-types separated, we are a bit fearless about breaking geo semver. We could be a bit more conservative if we saw value in it, but I think the point stands. (Personally, this is my biggest concern with merging the two.)
These are semver-breaking, but in practice only require a small amount of work (if any) to upgrade (I realise that's more than "none", but I think it's important to note)
There are definitely plenty of libraries that use geo-types without using geo (geozero, geojson, wkt, shapefile-rs), but I think what you mean is, at the end of the day, users of those libraries are all (or mostly all) using geo somewhere else in their codebase, so there's no net reduction in dependencies for the end user. Is that right?
Yep, I meant that there are very few libraries that only use -types
.
Keep the separate crates, and add new type wrappers for geo_type::geometries into geo. Then we could easily implement the traits (e.g. rstar) on the new types in geo with access to the algorithms. The problem though is we now have geo::Point and geo_types::Point which are not actually the same thing and will likely lead to hard to understand comilation errors.
I think this would be a big source of annoyance for crate consumers – in my / our experience this kind of incompatibility trips people up often, and the compiler error messages don't help you to fix it. Of course it's easy for us to point people in the right direction, but that means they have to tell us, someone has to look at the code etc.
We can continue to move shared algorithm code needed for third party trait integration from geo into geo_types::private_utils, but it starts to sabotage the purported benefits of the separation.
One of the reasons I opened this issue is that I don't think we can, particularly in the case of the various euclidean distance measures. There's way too much scaffolding code that makes it work; I think we've hit the limit of what can be practically duplicated, and the duplication we have now is already too extensive in my opinion – you have to know the geo
codebase well to even know that it exists, which is a barrier to people who want to help with any features that are implemented on geo-types
geometries but require geo
code.
Made the geo_types crate a simple re-export of those types. It's kind of a funny inversion - whereas currently geo gets its geometries from geo_types, now geo_types would get its geometries from geo.
This would be amazing if we could make it work!
These are semver-breaking, but in practice only require a small amount of work (if any) to upgrade (I realise that's more than "none", but I think it's important to note)
Agreed, the changes required are typically trival. I think for actively maintained libraries, it's a rounding error.
My bigger concern is for the long tail of less actively developed libraries that we'll be fracturing off from the ecosystem every time we do a geo-types release. Multiple times a year is a lot!
Splitting the types and algorithms into separate crates made sense at one point, but is now an ongoing source of annoyance, bad DX, technical debt, and frustrating papercuts for crate users and
geo
's developers because implementing methods fromgeo
ongeo-types
structs in third-party crates such asrstar
is not possible due to circular dependencies. This has led to the proliferation of things like secret modules and the complete impossibility of implementing full spatial indexing on e.g. Polygons: https://github.com/georust/geo/pull/984.I've thought about it, and I've concluded that we have no evidence that there are users of
geo-types
who don't also usegeo
, and that the proliferating need for spatial indexing ingeo
(for prepared geometries, unary unions to name but a few) is important to the crate and ecosystem's ongoing growth.I propose putting a plan and schedule in place to merge the crates as a priority.