georust / wkt

Rust read/write support for well-known text (WKT)
https://crates.io/crates/wkt
Apache License 2.0
50 stars 25 forks source link

`POINT EMPTY` vis-à-vis `geo_types` #61

Closed michaelkirk closed 3 years ago

michaelkirk commented 3 years ago

One point of friction when mapping WKT to geo_types is POINT EMPTY.

Similar EMPTY types present no such problem: GEOMETRYCOLLECTION EMPTY, LINESTRING EMPTY, POLYGON EMPTY all have intuitive corollaries in geo_types. (GeometryCollection(vec![]), LineString(vec![]), Polygon { exterior: vec![], interiors: vec![] }

You might say:

Obviously POINT EMPTY maps to Option::<geo_types::Point>::None!

To which I say:

GEOMETRYCOLLECTION(POINT EMPTY) or GEOMETRYCOLLECTION(GEOMETRYCOLLECTION(POINT EMPTY)) GEOMETRYCOLLECTION(POINT 1 1, GEOMETRYCOLLECTION(POINT EMPTY))

The conversion of which is not obvious to me.

A few options:

  1. Do nothing. Whenever converting to geo-types, continue to error whenever we encounter a POINT EMPTY, force the caller to reconcile any confusion. This is great because we wash our hands of the problem, but I think we'd be doing a disservice to, as policy, force every user to re-invent this essential wheel from scratch.

  2. add a deserialize_point -> Option<geo_types::Point> method to complement deserialize_geometry.

    #[cfg(feature = "geo-types")]
    pub fn deserialize_point<'de, D, T>(deserializer: D) -> Result<Option<geo_types::Point<T>>, D::Error>
    where
        D: Deserializer<'de>,
        T: FromStr + Default + WktFloat,
    {
    use serde::Deserialize;
    Wkt::deserialize(deserializer).and_then(|wkt: Wkt<T>| {
        use std::convert::TryFrom;
        geo_types::Point::try_from(wkt).map(|p| Some(p))
            .or_else(|e| {
            if let crate::conversion::Error::PointConversionError = e {
                // map a WKT: 'POINT EMPTY' to an `Option<geo_types::Point>::None`
                return Ok(None);
            }
    
            Err(D::Error::custom(e))
        })
    })
    }

This would only solve the top level POINT EMPTY case, and would continue to error on GEOMETRYCOLLECTION(POINT EMPTY) case. Would this half measure be worthwhile or only amplify the confusion?

  1. deserialize POINT EMPTY to GEOMETRYCOLLECTION EMPTY. This is at first a confusing alternative, but neatly makes a bunch of stuff "Just work". That is, parsing can continue, and, as far as I know, there are no operations for which a GEOMETRYCOLLECTION EMPTY would behave differently from a POINT EMPTY. It just "feels weird" to lose that type information. FWIW, I think this is the approach postgis takes. You can read some rationale from https://gis.stackexchange.com/a/106942:

It's not a bug, more like a strategic laziness. 1.X versions of PostGIS only supported GEOMETRYCOLLECTION EMPTY, not other forms of empty. For 2.X I (perhaps foolishly) embraced the full variety of nothings. The result is a not-entirely-complete support of varieties of nothing, made slightly ill-formed by the fact that support libraries like GEOS have their own concepts of what kinds of nothing are worth preserving, and that standards like WKB cannot even represent some forms of it (POINT EMPTY is not representable in WKB).

Anyhow, a bunch of nothing is still nothing. Do you you need to retain fidelity of your collections of nothing?

UPDATE

Looking at the PostGIS code, I'm pretty sure you're seeing an effect of the "is empty" function. Your input is in fact being parsed into an internal representation that reflects the input, a collection of empty things. But on output to WKB, the first test is "is this thing empty? if so emit an empty representation". So then we get this situation: is a geometry collection of empty things itself empty? Philosophy 101. In terms of most things we might do with it (calculate area or length, intersect it with things, really any operation at all) a collection of empty is just the same as a single empty. Mucking with the definition of "is empty" has a lot of knock-on effects all over the code base, so we haven't touched it much.

  1. maybe some combination of 2 and 3 above: have a deserialize_point -> Option<Point> method, but when encountering an internal Point, as in GEOMETRYCOLLETION(POINT 1 1, POINT EMPTY) convert it to GEOMETRYCOLLETION(POINT 1 1, GEOMETRYCOLLETION EMPTY). That seems preferable to discarding it since GEOMETRYCOLLECTION(POINT 1 1) would have an observably different member count.
rmanoka commented 3 years ago

Some combination of 2 and 3 seems like a good approach to me too. It would be good to convert POINT EMPTY to geo_types::MultiPoint([]) wherever the context allows it. That way, when we convert back to wkt, the type of the object is reasonably maintained. So POINT EMPTY -> geo_types::MultiPoint([]) -> MULTIPOINT EMPTY instead of GEOMETRYCOLLECTION EMPTY.

michaelkirk commented 3 years ago

That way, when we convert back to wkt, the type of the object is reasonably maintained.

Good Point<T>! 😐