georust / geo

Rust geospatial primitives & algorithms
https://crates.io/crates/geo
Other
1.57k stars 199 forks source link

Deprecate public geometry fields #816

Open michaelkirk opened 2 years ago

michaelkirk commented 2 years ago

Excepting Rects and Polygons, all of our geometry structs have public fields.

This is going to make releasing #797 not only a breaking change, but a breaking change without any deprecation notice. Downstream crates, when updating will not be able to build because of this error:

error[E0063]: missing fields `m` and `z` in initializer of `Coordinate<_, _, _>`
  --> lib/src/boolean/mod.rs:97:14
   |
97 |         max: Coordinate {
   |              ^^^^^^^^^^ missing `m` and `z`

That error message implies that the user should add an m and z argument, which is probably not what they want. The probable fix is for them to use the coord! macro instead.

But other than hoping that they track down the release notes or find the right docs on their own, there isn't much we can do to help them.

I know @nyurik went around and fixed some crates to use the coord! macro, but there seem to be plenty of other usages: https://github.com/search?l=rust&q=geo+Coordinate+%7B&type=Code (some false positives in there, but plenty of legit ones)

In an ideal world, we'd precede any breaking change with a non-breaking release with helpful deprecation messages that guide the user to the right solution, such that if the consumer fixes all the deprecations, they can expect to update to the next breaking release without issue. This isn't always possible, but it is in this case.

Specifically, I'm proposing we do something like this for at least Coordinate (if not all the Geometry enum members) in a non-breaking geo-types release to precede the breaking geo-types changes in #797.

#[derive(Eq, PartialEq, Clone, Copy, Debug, Hash, Default)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct Coordinate<T: CoordNum> {
    #[deprecated(since = "0.7.5", note ="Private field access is deprecated. For construction use `coord!(x: 1, y: 2)` or `Coordinate::new(x, y)`, for getters, use `coord.x()` or `coord.x_mut()`")]
    pub x: T,
    #[deprecated(since = "0.7.5", note ="Private field access is deprecated. For construction use `coord!(x: 1, y: 2)` or `Coordinate::new(x, y)`, for getters, use `coord.y()` or `coord.y_mut()`")]
    pub y: T,
}

impl<T: CoordNum> Coordinate<T> {
    pub fn new(x: T, y: T) -> Self {
        // we can delete these `allow(deprecated)`s upon the next breaking release, when the fields are made private.
        #[allow(deprecated)]
        Self { x, y }
    }

    pub fn x(&self) -> T {
        #[allow(deprecated)]
        self.x
    }

    pub fn x_mut(&mut self) -> &mut T {
        #[allow(deprecated)]
        &mut self.x
    }

    pub fn y(&self) -> T {
        #[allow(deprecated)]
        self.y
    }

    pub fn y_mut(&mut self) -> &mut T {
        #[allow(deprecated)]
        &mut self.y
    }
}

The downsides are that field access is easier to type. I bet some people appreciate the conciseness of being able to write coord.x rather than coord.x().

WDYT? Is it worth making this transition now? Are there other advantages to these fields being public?

michaelkirk commented 2 years ago

BTW if people are into this idea, I'm willing to do the grunt work. I should have time to complete it in the next several days.

urschrei commented 2 years ago

Yeah, I think we should do this as quickly as possible to give downstream crates as much time as possible to make the change, and maybe add a pinned issue about how to make the change?

nyurik commented 2 years ago

Agree, I am not a big fan of the public fields without accessors either, especially when we declare that values must not be NaN or inf without asserting that (possibly behind a feature flag). Let's make this happen asap to give a proper heads up. We should also mark the geometrycollection::new_from as depr too (?).

michaelkirk commented 2 years ago

Hiya @georust/core - this proposal to deprecate direct access to geo-type fields is likely to affect you.

This is being done to soften the blow of an imminent breaking change to our struct fields in #797.

Specifically, we're looking at deprecating this kind of code:

# to be deprecated
let c = geo_types::Cordinate { x: 1, y: 2};

# You'd do it like this instead
let c = geo_types::coord! { x: 1, y: 2};
# or do this
let c = geo_types::Coordinate::new(1, 2);

# to be deprecated
let x = c.x;

# You'd do it like this instead
let x = c.x();

# to be deprecated
c.x = 5;

# You'd do it like this instead
*c.x_mut() = 5;
# or do this
*c.set_x(5);

The other geometry fields would be similarly deprecated:

point.0
line.start/ line.end
triangle.0/triangle.1/triangle.2
line_string.0
multi_point.0
multi_line_string.0
multi_polygon.0
geometry_collection.0

Please weigh in if you have thoughts or concerns. Or consider giving this issue a :+1: as a sign of life if it seems fine.

You can see the full PR at #818

rmanoka commented 2 years ago

Why does field access like coord.x need to change? Does that not work with the ZM generics? It feels simpler to say coord.x = 10 than the mut methods

michaelkirk commented 2 years ago

Does that not work with the ZM generics? It feels simpler to say coord.x = 10 than the mut methods

I agree cood.x = 10 is more concise. And it would work with ZM generics. But without deprecating field access, as far as I know, there is no way to deprecate the public field initializers - e.g. Coordinate { x y }.

My concern is that the struct initializers are widely used, and the error people will see is not helpful.

error[E0063]: missing fields `m` and `z` in initializer of `Coordinate<_, _, _>`
  --> lib/src/boolean/mod.rs:97:14
   |
97 |         max: Coordinate {
   |    

But to be clear, we don't need to change field access. If the cure of private fields is worse than the disease of confusing build errors, an alternative would be to just let code like Coordinate { x, y } stop compiling one day, without a deprecation warning after we merge #797.

rmanoka commented 2 years ago

Understood. IMHO ZM coords is such a big and breaking change so it should be easy to spot / fix the error on upgrade. Field access (not incl. intialization) seems so natural for Coord.

For instance, nalgebra which also does a lot of generics magic still manages to expose field access (via some sophisticated DerefMut impls): So you get vec3.z, mat3.m11, vec2.x etc.

nyurik commented 2 years ago

@rmanoka i believe you are referring to nalgebra's coordinates.rs code. Seems magical, and I couldn't easily follow the logic there, but i'm all for doing that approach if it would allow seamless migration.

michaelkirk commented 2 years ago

IMHO ZM coords is such a big and breaking change so it should be easy to spot / fix the error on upgrade

I agree that it should be easy to spot - the code will not compile! And actually I think all the fixes are pretty easy if you already know what to do — just use coord! {x: x, y: y} not Coordinate { x, y }.

The reason I opened this issue was because I saw there was pretty wide use of the Coordinate struct initializer in the wild, and, unless you're following this discussion, I don't think it's very obvious what the "proper fix" is. Maybe we can trust people to read the changelog when things don't work rather than try to go down the doomed path of doing what the compiler error suggests (add the "missing fields m and z in initializer").

I've never used nalgebra - but I'm looking at it now. It seems really cool!

I see some examples with field access, but I haven't seen any usage of the field-wise initializers. e.g.

    /// # use nalgebra::Vector3;
    /// let mut vec = Vector3::new(1.0, 2.0, 3.0);
    /// vec.x = 10.0;
    /// vec.y += 30.0;
    /// assert_eq!(vec.x, 10.0);
    /// assert_eq!(vec.y + 100.0, 132.0);

I'm still digging in and trying to understand how it works - it might be a good compromise!

michaelkirk commented 2 years ago

nalgebra which also does a lot of generics magic still manages to expose field access (via some sophisticated DerefMut impls): So you get vec3.z, mat3.m11, vec2.x etc.

I've been experimenting with this a bit. Deref is pretty wild!

Here's a branch with private Coordinate fields which will still allow direct field access via Deref. Note that, like nalgebra, this relies on unsafe and repr(c): https://github.com/georust/geo/compare/mkirk/deref-minimal?expand=1

With that, when someone tries to build a field wise Coordinate initializer, they'll fail with this error:

error[E0451]: field `x` of struct `geo_types::Coordinate` is private
   --> geo/src/lib.rs:309:41
    |
309 |         let legacy_coord = Coordinate { x: 1, y: 2 };
    |                                         ^^^^ private field

error[E0451]: field `y` of struct `geo_types::Coordinate` is private
   --> geo/src/lib.rs:309:47
    |
309 |         let legacy_coord = Coordinate { x: 1, y: 2 };
    |                                               ^^^^ private field

But they'll still be able to:

let coord = coord!(x: 1, y: 2);
assert_eq!(2, coord.y);

Is that conceptually what you were envisioning @rmanoka?

WRT legacy users, it seems like a slightly better error than #797: "missing fields m and z in initializer", but it still doesn't really point you to the proper fix. Presumably at this point it'd be clear that field wise initialization is impossible, and you might look for a Coordinate::new* or track down the Coordinate docs to find valid init examples.

What I failed to find was any way to actually gracefully deprecate (without error) the field wise initializer while still retaining undeprecated field access. I can't imagine how that would work.

An incremental improvement maybe: We could take this opportunity to address #540, and have a helpful deprecation warning along with the error:

- pub struct Coordinate<T: CoordNum> {
+ pub struct Coord<T: CoordNum> {
     x: T,
     y: T,
 }

+ #[deprecated(
+     note = "use the new Coord struct with initializer: coord!(x: 1.0, y: 2.0) or Coord::new_xy(1.0, 2.0)"
+ )]
+ pub type Coordinate<T> = Coord<T>;

# rename _all_ internal usages to avoid deprecation warnings
- impl<T: CoordNum> Coordinate<T> {
+ impl<T: CoordNum> Coord<T> {
    pub fn new_xy(x: T, y: T) -> Self {
        Self { x, y }
    }
}

(fully applied here: https://github.com/georust/geo/compare/mkirk/deref-error-and-coord-deprecation?expand=1#diff-6ff837506e614f9a8a7c162d78e60b81e6020202c551fcb680b2b8867ba16e17R45)

That'll still fail to compile legacy code, but the deprecation warning can at least reference the valid init methods:

warning: use of deprecated type alias `geo_types::Coordinate`: use the new Coord struct with initializer: coord!(x: 1.0, y: 2.0) or Coord::new_xy(1.0, 2.0)
   --> geo/src/lib.rs:303:35
    |
303 |     use geo_types::{coord, Coord, Coordinate};
    |                                   ^^^^^^^^^^
    |
    = note: `#[warn(deprecated)]` on by default

warning: use of deprecated type alias `geo_types::Coordinate`: use the new Coord struct with initializer: coord!(x: 1.0, y: 2.0) or Coord::new_xy(1.0, 2.0)
   --> geo/src/lib.rs:308:28
    |
308 |         let legacy_coord = Coordinate { x: 1, y: 2 };
    |                            ^^^^^^^^^^

error[E0451]: field `x` of struct `geo_types::Coord` is private
   --> geo/src/lib.rs:308:41
    |
308 |         let legacy_coord = Coordinate { x: 1, y: 2 };
    |                                         ^^^^ private field

error[E0451]: field `y` of struct `geo_types::Coord` is private
   --> geo/src/lib.rs:308:47
    |
308 |         let legacy_coord = Coordinate { x: 1, y: 2 };
    |                                               ^^^^ private field
rmanoka commented 2 years ago

This is very cool @michaelkirk ; thanks for trying out the ideas. The last approach to rename to Coord seems good to me. The field access for Coord alone could be done via some unsafety, and move back to vanilla pub fields (but all 4 coords) in the next breaking release. Is that what your suggesting?

nyurik commented 2 years ago

I also like the renaming approach. My biggest concern is the possibility (unconfirmed) of the tech debt we will have with the repr(C) + unsafe. repr(c) seems fairly straightforward for the simple usecases (numeric values + NoValue), but it could represent some unexpected gotchas if someone tries some creative custom types - a fairly remote possibility to be fair. I don't know if there are any performance implications with repr(c) (i.e. field ordering).

So basically it is a "dumb vanilla Rust approach" that forces .x() / direct public field usage, or the "creative repr(c) + unsafe approach" that is much cleaner to use, but could cause some unknown issues down the line. </rant>

michaelkirk commented 2 years ago

move back to vanilla pub fields (but all 4 coords) in the next breaking release. Is that what your suggesting?

I would actually suggest to keep the fields on Coord private. Via the impl Deref, we already have access to field setter/getter. So what's gained by pub? Seems like only the ability to use the field wise initializer is gained. But that field wise initialize requires specifying all the fields, so it'd really only be useful for people doing 4D.

People doing 2D coords would need to do something unpleasant like:

let coord = Coord { x, y, z: NoValue, m: NoValue};

Otherwise they'd get the same unhelpful error:

error[E0063]: missing fields `m` and `z` in initializer of `geo_types::Coord<_>`
   --> geo/src/lib.rs:308:28
    |
308 |         let legacy_coord = Coord { x: 1, y: 2 };
    |                            ^^^^^ missing `m` and `z`

So my suggestion is to keep them private, and double down on macro usage: coord!(x: 1, y: 2) / coord!(x: 1, y: 2, z: 3)), and perhaps some unambiguous constructors like Coord::new_xy/ Coord::new_xyz / Coord::new_xym / Coord::new_xyzm if we are to maintain our sanity in a post 2D world.

And another thing to clarify - the breaking change would occur when making these fields private (because it would break people's ability to use field wise initializers). But then, from there, adding 3D would not entail another break. Am I thinking about that right?

michaelkirk commented 2 years ago

I don't know if there are any performance implications with repr(c)

This is worth calling out. So long as all of our fields are the same size my understanding is there's nothing to be gained by the default repr over repr(c). But if you did something like coord! { x: u8, y: u8, z: u32, m: u16 }, then rust might choose to benefit from reordering our fields if we didn't have the repr(c).

rmanoka commented 2 years ago

I don't think repr(C) is needed for the transmute. Only repr(transparent). And make Coord a wrapper around the inner fields we will expose via the deref-mut.

rmanoka commented 2 years ago

move back to vanilla pub fields (but all 4 coords) in the next breaking release. Is that what your suggesting?

I would actually suggest to keep the fields on Coord private. Via the impl Deref, we already have access to field setter/getter. So what's gained by pub? Seems like only the ability to use the field wise initializer is gained. But that field wise initialize requires specifying all the fields, so it'd really only be useful for people doing 4D.

People doing 2D coords would need to do something unpleasant like:

let coord = Coord { x, y, z: NoValue, m: NoValue};

Otherwise they'd get the same unhelpful error:

error[E0063]: missing fields `m` and `z` in initializer of `geo_types::Coord<_>`
   --> geo/src/lib.rs:308:28
    |
308 |         let legacy_coord = Coord { x: 1, y: 2 };
    |                            ^^^^^ missing `m` and `z`

So my suggestion is to keep them private, and double down on macro usage: coord!(x: 1, y: 2) / coord!(x: 1, y: 2, z: 3)), and perhaps some unambiguous constructors like Coord::new_xy/ Coord::new_xyz / Coord::new_xym / Coord::new_xyzm if we are to maintain our sanity in a post 2D world.

And another thing to clarify - the breaking change would occur when making these fields private (because it would break people's ability to use field wise initializers). But then, from there, adding 3D would not entail another break. Am I thinking about that right?

I meant switch back if we want to clamp down on unsafe code or similar. I'm happy with keeping the deref-mut impl. too.

rmanoka commented 2 years ago

Any idea why the repr(c) is needed? Here's a sample code where transmute seems to work as long as z and m are zsts.

nyurik commented 2 years ago

I think this is more of a "compiler guarantee" question, i.e. I think it is bad to rely on "happens to work", we should rely on the explicit guarantees by the compiler that our usage of unsafe is "safe". Do we know that it is a valid usage?

A straw man example - Rust could decide to sort fields in a struct based on how often each field is used, putting the most used one first so that access is potentially faster or because of a higher caching rate.

michaelkirk commented 2 years ago

Here's a sample code where transmute seems to work as long as z and m are zsts.

The nomicon definitively lays this out as (currently) undefined: https://doc.rust-lang.org/nomicon/repr-rust.html

struct A {
    a: i32,
    b: u64,
}

struct B {
    a: i32,
    b: u64,
}

Rust does guarantee that two instances of A have their data laid out in exactly the same way. However Rust does not currently guarantee that an instance of A has the same field ordering or padding as an instance of B.

You raise a good apoint about repr(transparent) though! It'd be nice to avoid repr(c), and I can see how transparent should be sufficient for the current case, with only XY coords. I need to think through the implications on a generic hyper dimensional coord for a while though. đŸ˜”â€đŸ’«

rmanoka commented 2 years ago

How about something like this: https://gist.github.com/rmanoka/23cd9237abff92f41759e44222ebb301 The gist, well shorter version, is to declare Coordinate like:

pub struct Coord<T, Z = (), M = ()> {
    xy: Coord2<T>,
    z: Z,
    m: M,
}

Then we can use Coord2 without any extra unsafety to impl. deref*. Then, we can use like:

// Does not work
// let coord: Coord<_> = Coord2 { x: 0., y: 0. };

let mut coord = Coord::new(10., 20.);
coord.x -= 5.0;
assert_eq!(coord.x, 5.0);
michaelkirk commented 2 years ago

Oh right - no need for the unsafe transmute at all! I don't know why I immediately escalated to that. I guess because it felt "serious" 😆. Thanks for checking me @rmanoka!

One last thing I noticed that I kind of liked about nalgebra, is how they only allowing access to the available fields. e.g. with #797 you can access NoValue on pt2d.z and pt2d.y, whereas in nalgebra vec2.z it is an error because there is no deref impl with a Z field for the 2d coord. My instinct is that it would be better to disallow people from accessing this NoValue's at all.

I wonder if we can achieve something similar without too much knock on effect. Here's a little POC of what I mean: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=967ae52ba1570fdeb0483801e5d486ff

I'm sure applying it to the actual geo-types would be harder. But is it a good goal to remove access of z/m fields from 2d coords?

nyurik commented 2 years ago

This approach seem to break all functions that pass around Coordinate<T>. Instead it is Coord<CoordXY<T>>, which is also a problem because generic code could use Coord<T,Z,M> -- which includes both the value and nonvalued variants.

michaelkirk commented 2 years ago

This approach seem to break all functions that pass around Coordinate

Do you mean what was linked in my playground? That's not intended to be dropped into geo-types, but rather a simplified version to show what I mean by having a generic container.

In geo-types, I think it'd have to look more like:

#[derive(Eq, PartialEq, Clone, Copy, Debug, Hash, Default)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
pub struct Coord<T: CoordNum, C = CoordXY<T>> {
    c: C,
    // we need this to keep the type signature nice, so that the first param (which is required) is
    // the numeric type of the coord fields - the coord itself (e.g. CoordXY) need not be specified
    // manually. Which is good, because it's quite verbose!
    _marker: PhantomData<T>,
}
nyurik commented 2 years ago

thx @michaelkirk , i hacked together a working example with your suggestion. Non-2D types can still be defined with the type alias.

fn make_xy() -> Coord<i32> {
    Coord::new_xy(1, 2)
}

fn make_xyz() -> Coord3D<i32> {
    Coord::new_xyz(1, 2, 3)
}

With this approach, Is it possible to define an algorithm once without doing it for each case e.g. I don't see how to have an algorithm that shifts points around in 2D space while preserving z/m values - regardless if z/m values are present or not?

michaelkirk commented 2 years ago

That's a good point. I'm not sure how it'd be possible without using either: