georust / geojson

Library for serializing the GeoJSON vector GIS file format
https://crates.io/crates/geojson
Apache License 2.0
272 stars 59 forks source link

more ergonomic property access #157

Open michaelkirk opened 3 years ago

michaelkirk commented 3 years ago

Here's some mostly real code I wrote today... I was surprised at how verbose the property accessing code needed to be.

let collection = geojson::FeatureCollection::try_from(geojson)?;
for mut feature in collection.features.into_iter() {
        // 1. getting a property off each feature
        // this got much better thanks to @rory in https://github.com/georust/geojson/pull/143
        let population = feature.property("population");

        // 2.a blech - unwrapping the type is awkward
        let total_population = match population {
            Some(serde_json::Value::Number(n)) => n
                    .as_u64()
                    .expect(&format!("unexpected total population number: {:?}", n))
                      as usize,
            _ => {
                return Err(format!("unexpected format for 'population': {:?}", population));
            }
        };

        let geometry = feature.geometry.expect("geojson feature missing geometry");
        let multi_poly = geo::MultiPolygon::<f64>::try_from(geometry.value)?;
        results.push((multi_poly, population))
    }
}

It's been a while since I've used this crate - is there some better way I should know about?

From what I can tell - we're basically plumbing along the interface of the json crate. But I was toying with an alternative:

let collection = geojson::FeatureCollection::try_from(geojson)?;
for mut feature in collection.features.into_iter() {
        let population = feature.property("population");

-       // 2.a blech - unwrapping the type is awkward
-       let total_population = match population {
-           Some(serde_json::Value::Number(n)) => n
-                   .as_u64()
-                   .expect(&format!("unexpected total population number: {:?}", n))
-                     as usize,
-           _ => {
-               return Err(format!("unexpected format for 'population': {:?}", population));
-           }
-       };
+       // 2.b - based off of #156 - but does a kind of "try into"
+       let total_population: u64 = feature.remove_property_into("population")?;
+
+       // 2.c presumably could also have a non-owning interface like:
+       let total_population: &u64 = feature.property_into("population")?;

        let geometry = feature.geometry.expect("geojson feature missing geometry");
        let multi_poly = geo::MultiPolygon::<f64>::try_from(geometry.value)?;
        results.push((multi_poly, population))
    }
}

My approach so far entails something like:

pub trait TryFromJsonValue: Sized {
    type Error;
    fn try_from_json_value(json_value: JsonValue) -> Result<Self, Self::Error>;
}

impl TryFromJsonValue for u64 {
    type Error = String;
    fn try_from_json_value(json_value: JsonValue) -> Result<Self, Self::Error> {
        match json_value {
            Value::Null => todo!(),
            Value::Bool(_) => todo!(),
            Value::Number(n) => {
                n.as_u64().ok_or(todo!())
            }
            Value::String(_) => todo!(),
            Value::Array(_) => todo!(),
            Value::Object(_) => todo!(),
        }
    }
}

// ... we'd need on for *every* Json type.
// impl TryFromJsonValue for String { ... }
// impl TryFromJsonValue for f64 { ... }
// impl TryFromJsonValue for i64 { ... }

impl Feature {
    pub fn remove_property_into<T: TryFromJsonValue>(&mut self, key: impl AsRef<str>) -> Result<T, foo> {
        self.properties.as_mut().and_then(|props| props.remove(key.as_ref())).and_then(|json_value| {
            T::try_from_json_value(json_value)
        })
    }
}

Other thoughts: we already have a nice quick_collection method for converting a geojson to a geo collection wholesale, but that doesn't allow you to access the properties if you want to build up some structs like:

struct Place {
    geometry: geo::Geometry,
    name: String, // <- pulled from geojson props
    year_established: u64 // <- pulled from geojson props
}

I'd love to be able to write something like:

let places: Vec<Place> = geo_json.geometry_prop_iter().map(|geometry: Geometry, properties: JsonObject| {
    Place { 
        geometry: geometry.into(), 
        name: properties.try_into_delete("name")?, 
        year_established: properties.try_into_delete("year")? 
    }
}).collect()

WDYT?

/cc @rory

michaelkirk commented 3 years ago

Thinking about it a bit more... though using generics is a neat trick that allows us to only add one method to impl Feature, I always find generics really hard to discover.

For example, in geojson, there's only a few numeric types supported "natively" - u64, i64, and f64.

I think it would be potentially surprising to folks if this worked:

let total_population: u64 = feature.remove_property_into("population")?;

While this gave an obscure error about TryFromJsonValue is not implemented for u32

let total_population: u32 = feature.remove_property_into("population")?;

Maybe it'd be better to just hardcode the type into the method name - without generics like:

let total_population = feature.remove_u64_property("population")?;
let foo = feature.remove_i64_property("foo")?;
let bar = feature.remove_f64_property("bar")?;

And for Copy types here, it's probably reasonable to have non-consuming accessors like:

let total_population: u64 = feature.u64_property("population")?;
let foo: i64 = feature.i64_property("foo")?;
let bar: f64 = feature.f64_property("bar")?;

While for non-copy types like Array, something like:

let places: &... = feature.array_property("places")?;

So, all in all this could be a dozen or so new accessors.

michaelkirk commented 3 years ago

Another awkward thing I just realized:

While this works:

let collection = geojson::FeatureCollection::try_from(geojson)?;
for mut feature in collection.features.into_iter() {
        // [1] getting a property off each feature 
        let population = feature.property("population");

        let total_population = match population {
            Some(serde_json::Value::Number(n)) => n
                    .as_u64()
                    .expect(&format!("unexpected total population number: {:?}", n))
                      as usize,
            _ => {
                return Err(format!("unexpected format for 'population': {:?}", population));
            }
        };

        // [2] Now get geometry
        let geometry = feature.geometry.expect("geojson feature missing geometry");
        let multi_poly = geo::MultiPolygon::<f64>::try_from(geometry.value)?;
        results.push((multi_poly, population))
    }
}

It does not work if we interact with the geometry first, since feature has potentially moved.

let map_area: geo::Rect = get_area_of_interest();
let collection = geojson::FeatureCollection::try_from(geojson)?;
for mut feature in collection.features.into_iter() {
        // [1] get geometry first
        let geometry = feature.geometry.expect("geojson feature missing geometry");
        let multi_poly = geo::MultiPolygon::<f64>::try_from(geometry.value)?;

        // we're only interested in geometries within our map area
        if !multi_poly.intersects(map_area) {
            continue;
        }

        // [2] then get a property 
        // 🚨this fails because: `feature.geometry` partially moved above
        let population = feature.property("population");

        let total_population = match population {
            Some(serde_json::Value::Number(n)) => n
                    .as_u64()
                    .expect(&format!("unexpected total population number: {:?}", n))
                      as usize,
            _ => {
                return Err(format!("unexpected format for 'population': {:?}", population));
            }
        };

        results.push((multi_poly, population))
    }
}

This makes me wonder if any ergonomic accessors should be on features.properties rather than features

e.g. features.properties.remove_into("foo") rather than features.remove_property_into("foo")

frewsxcv commented 3 years ago

I don't have anything substantive to add here, but I agree the ergonomics of property retrieval is difficult and needs some love. At the very least, the user should not have to pull in serde_json just to do basic operations with the geojson crate.

amandasaurus commented 3 years ago

What about Feature::property_as::<u64>("key")? I have a plan for how to implement this, which would require changing serde_json crate (cf. https://github.com/serde-rs/json/issues/738 )

michaelkirk commented 3 years ago

Thanks for weighing in!

Feature::property_as::("key")

Right, this looks nice, but there are a couple of complications that I'm hung up on which aren't clear from your example:

Given:

{
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [125.6, 10.1]
  },
  "properties": {
    "name": "Dinagat Islands",
    "favorite_numbers": [1, 3, 5]
  }
}

Complication 1: non 64 bit numbers

json crate, in keeping with javascript, only deals with 64 bit numbers. Do we do likewise so that:

This compiles: Feature::property_as::<u64>("key") And this fails to compile?: Feature::property_as::<u32>("key") due to missing impl on u32?

Or do we do an implicit coercion and silently change the precision assumptions from the json crate?

Without implicit coercion, we'd only implement this feature for 64bit numbers, and I find the discoverability of generic impls to be really hard, especially for people first picking up a crate.

This is why I was advocating for the more manual approach of implementing something like u64_property("key"), i64_property("key"), f64_property("key"). I think it's more discoverable and avoids implicit conversion. (edit: to clarify, there would be no u32_property, i32_property, f32_property. My theory is that by having the methods exist explicitly, rather than implicitly through a trait impl, the implications of this would be clearer to the user: you can only parse out 64 bit numbers)

If we were to go one of the two other directions, I think I prefer the downsides of implicit conversion rather than inscrutable compiler errors because someone is trying to parse out a f32.

Complication 2: non-copy types

Your example shows a copy type, which is pretty straight forward, but what about non-copy types?

I guess these would have to be references?

let name: &str = geojson.property_as("name")
let favorite_numbers: &Vec<u64> = geojson.property_as("name")

But it seems like we'd want a way to have non-reference types, and avoid unnecessary cloning. That's what I was getting at with the remove_* methods in my example, since I think we could uniformly move the underlying value regardless of type:

let name: String = geojson.remove_property_as("name")
let favorite_numbers: Vec<u64> = geojson.remove_property_as("favorite_numbers")
amandasaurus commented 3 years ago

On Fri, 01 Jan 2021 18:32, Michael Kirk notifications@github.com wrote:

This compiles: Feature::property_as::<u64>("key") And this fails to compile?: Feature::property_as::<u32>("key") due to missing impl on u32?

I planned to implemente it for u8/u16/etc using some sort of implicit conversion. But I'm unsure.

Or do we do an implicit coercion and silently change the precision assumptions from the json crate?

Without implicit coercion, we'd only implement this feature for 64bit numbers, and I find the discoverability of generic impls to be really hard, especially for people first picking up a crate.

“Discoverability” of a API is important. Could that be addressed with lots of doc comments showing all the options?

This is why I was advocating for the more manual approach of implementing something like u64_property("key"), i64_property("key"), f64_property("key"). I think it's more discoverable and avoids implicit conversion.

If we were to go one of the two other directions, I think I prefer the downsides of implicit conversion rather than inscrutable compiler errors because someone is trying to parse out a f32.

But it seems like we'd want a way to have non-reference types, and avoid unnecessary cloning. That's what I was getting at with the remove_* methods in my example, since I think we could uniformly move the underlying value regardless of type:

Yes, a non-copy alternative should exist too, for strings, arrays & other objects.

michaelkirk commented 3 years ago

“Discoverability” of a API is important. Could that be addressed with lots of doc comments showing all the options?

Examples would definitely help, but I'm less worried about discoverability if the trait is uniformly implemented for every type the user is likely to use.

I was more worried if we base our implementation on TryFrom and the serde_json crate doesn't want to implement TryFrom for non 64bit types.

amandasaurus commented 3 years ago

Unfortunately this idea was initially turned down on the json issue tracker

I have made a property_as function which serde's deserialize approach (recommended by serde_json dev) in this test branch. While it works fine for primitive types u16/u64/etc, and I think strings, and seems to work for extracting a Vec<_>, I don't think it's possible to get a references to a &[_].

I've asked on the original json bug report.