georust / geozero

Zero-Copy reading and writing of geospatial data.
Apache License 2.0
321 stars 30 forks source link

Documentation for parsing GeoJSON from a file #198

Open nk9 opened 4 months ago

nk9 commented 4 months ago

I am trying to read GeoJSON from a file and iterate through the features in the FeatureCollection with their geom and feature properties. I would have thought this would be a very common use case, but I don't see any documentation giving example code for how to do this. All the examples seem to assume you've got the string of a single geometry in memory already, or that you're using FlatGeobuf (*.fgb) files.

I've found two examples on GitHub, but:

Have I missed the sample code for this? I'd expect a simple example in the overview on the GeoJsonReader page, and probably in the main README as well.

kylebarron commented 4 months ago

The GeoJsonReader accepts any input that implements Read. So you can pass in a File or, preferably, a BufReader<File>.

nk9 commented 4 months ago

Thanks for the quick reply. I found this issue and adapted his code to do what I needed, along with this bit of their docs. I'm thinking that the answer to my question is "geojson is the a higher-level library, and is probably what you want to use for simply iterating through features in a geojson file." Is that safe to say?

Update: Thanks for the note about BufReader. That is an order of magnitude faster than just passing a File directly!

Update 2: I've achieved another order of magnitude speed-up by using rayon and placing the entire load_geojson function inside par_iter(). Down to 2.2 seconds to read 280 files containing 175k features.

For posterity, here's what I ended up with:

use geo::{MultiLineString, MultiPolygon};
use geojson::{Feature, GeoJson, Value};
use std::fs::File;
use std::io::BufReader;

fn load_geojson(path: &PathBuf) -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);
    let geojson = GeoJson::from_reader(reader)?;

    match geojson {
        GeoJson::FeatureCollection(collection) => {
            for feature in collection.features {
                let _ = process_feature(&feature)?;
            }
        }
        _ => println!("Unsupported GeoJSON type"),
    }

    Ok(())
}

fn process_feature(feature: &Feature) -> Result<(), Box<dyn std::error::Error>> {
    // String value of the "name" property, or an empty string
    let name = feature
        .property("name")
        .and_then(|v| v.as_str())
        .map_or(String::from(""), |s| s.to_string());

    let geom = feature.geometry.as_ref().unwrap();

    match &geom.value {
        Value::MultiPolygon(_) => {
            let p: MultiPolygon<f64> = geom.value.clone().try_into().unwrap();
            println!("{p:?}");
        }
        Value::MultiLineString(_) => {
            let p: MultiLineString<f64> = geom.value.clone().try_into().unwrap();
            println!("{p:?}");
        }
        _ => panic!("not a recognized feature type"),
    };

    Ok(())
}
kylebarron commented 4 months ago

I'm thinking that the answer to my question is "geojson is the a higher-level library, and is probably what you want to use for simply iterating through features in a geojson file." Is that safe to say?

I'd say the opposite. geojson is lower-level in the sense that you have to handle specifics about GeoJSON to handle input data. geozero is higher-level in the sense that GeoJSON input is just one type of input, but can export to any consumer. For example in geoarrow-rs GeoJsonReader::process just works even though geozero has no knowledge of the GeoArrow output format.

I'd say the real issue is that there's no "default" library in georust for handling geometries with attributes. You can parse to geo structs but then you lose the associated attributes. This is a main feature of geoarrow though; being really optimized about both the geometries and their attributes.

nk9 commented 4 months ago

OK, that's interesting to know. If this is the higher-level library, I think it's even more important to have some sample code of loading a GeoJSON file (ideally from disk, but at least from a string) and iterating its features. I never got that to work with geozero.

As for geometries and attributes, the geojson::Feature struct seems to handle that pretty well with feat.property("prop_key") and feat.geometry upon pulling them out of the file. But you're right, I had to create my own struct to store the converted geometry along with the properties I wanted. Providing a ready-made struct for that purpose would be a nice QoL improvement.

kylebarron commented 4 months ago

I never got that to work with geozero.

You'd need to impl your own GeozeroDatasource. It's a bit of work, which is why it's not done often; instead converting to existing representations instead of creating your own.

As for geometries and attributes, the geojson::Feature struct seems to handle that pretty well with feat.property("prop_key") and feat.geometry upon pulling them out of the file

Sure, but that's storing attributes in the GeoJSON model, which is quite restrictive. For example, you can't store a date time in GeoJSON; you can only store a string.

Providing a ready-made struct for that purpose would be a nice QoL improvement.

It's not geozero's concern to provide those structs. Geozero focuses only on conversions between representations. I'm building a representation around Arrow, which enables storing properties quite efficiently, but does incur some large dependencies.

nk9 commented 4 months ago

OK, well I filed this bug to document that I wanted to do something I thought was very common, and yet could find no sample code on how to do it. I've solved my problem at this point. If you want people to actually use this library for parsing features and properties out of geojson files, especially given that it's nonobvious how to do that, then having some sample code would really help and I'd suggest this bug should stay open.

But if people looking for a simple way to iterate through features/properties in a GeoJSON files should just use geojson instead, which already has sample code for this, then that's fine too. I'd still suggest that it would be friendly to new devs to put a pointer about this in the docs, but that's up to you.

Thanks for engaging with me on this, just trying to make things a little easier for the next guy or gal. :-)