georust / geojson

Library for serializing the GeoJSON vector GIS file format
https://crates.io/crates/geojson
Apache License 2.0
276 stars 60 forks source link

Stack allocate 2d points using tiny_vec. #222

Open michaelkirk opened 1 year ago

michaelkirk commented 1 year ago

Position, which is the fundamental unit for all the various geojson geometries, is currently represented as a (heap allocated) Vec. This PR swaps it out to be a TinyVec which can be stack allocated up to a certain number of elements, before moving to the heap.

This PR also makes that fact private so we can make similar changes in the future in a non-breaking way.

Unlike #218, this branch maintains the ability to have n-dimensional points, building upon the branch I started in this comment.

This is a breaking change (maybe a substantial one for people relying on the Position type, but shouldn't affect people who are only using the high level APIs.). Switching to tiny_vec will be breaking for these people regardless, and the upside to making the struct opaque like this is that we can now more freely muck about with the internals, e.g. if we one day want to replace tiny_vec, or add a feature flag which allocates 3d stack allocations rather than 2, etc.

We could add a macro like geojson::position![1.0, 3.0] if people think the Position::from([1.0, 3.0]) is too verbose.

What do people think? I've wanted to do something like this for a long time, but kind of avoided it because it's a breaking change, and going to be annoying for anyone who was accessing the Position type directly.

I've tried to alleviate this a bit by implementing Index/Mut on Position. Are there other things I could do to make it break less? If you have a crate that uses geojson, I'd love to have some test cases to integrate this against to see how annoying it is.

bench improvements look similar, but slightly less than #218 ``` Benchmarking parse (countries.geojson): Collecting 100 samples in estimated 9.2678 s (5050 iter parse (countries.geojson) time: [1.8266 ms 1.8336 ms 1.8424 ms] change: [-9.1846% -8.6131% -8.0823%] (p = 0.00 < 0.05) Performance has improved. Benchmarking FeatureReader::features (countries.geojson): Collecting 100 samples in estimated 5 FeatureReader::features (countries.geojson) time: [5.7831 ms 5.7977 ms 5.8142 ms] change: [-2.3441% -1.9763% -1.6057%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe Benchmarking FeatureReader::deserialize (countries.geojson): Collecting 100 samples in estimate FeatureReader::deserialize (countries.geojson) time: [7.2027 ms 7.2062 ms 7.2100 ms] change: [-3.0806% -2.7459% -2.4422%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild Benchmarking FeatureReader::deserialize to geo-types (countries.geojson): Warming up for 3.0000 Benchmarking FeatureReader::deserialize to geo-types (countries.geojson): Collecting 100 sample FeatureReader::deserialize to geo-types (countries.geojson) time: [7.2114 ms 7.2147 ms 7.2181 ms] change: [-3.5735% -3.2389% -2.9182%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe Benchmarking parse (geometry_collection.geojson): Collecting 100 samples in estimated 5.0630 s parse (geometry_collection.geojson) time: [15.050 µs 15.061 µs 15.073 µs] change: [-5.1317% -4.8415% -4.5631%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild Running benches/serialize.rs (target/release/deps/serialize-dce81d4a785957aa) Benchmarking serialize geojson::FeatureCollection struct (countries.geojson): Warming up for 3. Benchmarking serialize geojson::FeatureCollection struct (countries.geojson): Collecting 100 sa serialize geojson::FeatureCollection struct (countries.geojson) time: [2.8550 ms 2.8581 ms 2.8619 ms] change: [-0.1569% +0.0908% +0.3323%] (p = 0.47 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Benchmarking serialize custom struct (countries.geojson): Collecting 100 samples in estimated 5 serialize custom struct (countries.geojson) time: [2.2025 ms 2.2052 ms 2.2092 ms] change: [-0.9366% -0.5925% -0.2551%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe Benchmarking serialize custom struct to geo-types (countries.geojson): Collecting 100 samples i serialize custom struct to geo-types (countries.geojson) time: [2.2465 ms 2.2496 ms 2.2538 ms] change: [-6.4444% -6.1566% -5.8693%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe Running benches/to_geo_types.rs (target/release/deps/to_geo_types-e8e400be9c708cf6) quick_collection time: [65.569 µs 65.815 µs 66.131 µs] change: [-71.144% -71.004% -70.873%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe ```
michaelkirk commented 1 year ago

Another note, though this greatly speeds up (change: [-71.144% -71.004% -70.873%]) converting already parsed geojson to other formats, this doesn't seem to affect the actual parsing speeds all that much.

I'm not 100%, but I think this is because our parsing is still mostly building up a serde_json::JsonValue per Position, which creates a Vec anyway. I think with these stack allocatable Positions, we'll unlock some good reason to revisit our parsing code.

urschrei commented 1 year ago

As we did before, we could have a look to see who might be relying on Position using GH Code Search and give them a heads up. I would be surprised if there are many explicit uses, but who knows. Given the perf win, I think it's worth the potential downstream inconvenience, especially with the (potential) macro and Index* traits.