georust / geozero

Zero-Copy reading and writing of geospatial data.
Apache License 2.0
321 stars 30 forks source link

Omit GeoJSON properties with null values #206

Closed kylebarron closed 2 months ago

kylebarron commented 3 months ago

In https://github.com/geoarrow/geoarrow-rs/issues/588 I got a bug report about a GeoJSON file that failed to read. The issue here is a column that's numeric for the first N rows but then is null for some row N + 1. Currently the GeoJSON reader coerces a null value to the string "null": https://github.com/georust/geozero/blob/3378dda305ec88cabb092d458f8a61a140f60827/geozero/src/geojson/geojson_reader.rs#L220-L221

This is hard to work with because in my processor I'd have to check every input string value (from any geozero reader, for any column) against the string "null" and manage type conversions. E.g. if the first few rows of a column are the string "null" but then I see an f64 value, I need to be able to coerce that column to a Float64Array (which in Arrow has its own null bitmask).

I'd argue that the simplest approach in this specific case is to do a no-op for GeoJSON null values. That {"a": 1, "b": null} is equivalent to {"a": 1}. That seems like a much smaller change than, say, adding a ColumnValue::Null variant. But interested to hear others' thoughts.