This is hard to work with because in my processor I'd have to check every input string value (from any geozero reader, for any column) against the string "null" and manage type conversions. E.g. if the first few rows of a column are the string "null" but then I see an f64 value, I need to be able to coerce that column to a Float64Array (which in Arrow has its own null bitmask).
I'd argue that the simplest approach in this specific case is to do a no-op for GeoJSON null values. That {"a": 1, "b": null} is equivalent to {"a": 1}. That seems like a much smaller change than, say, adding a ColumnValue::Null variant. But interested to hear others' thoughts.
In https://github.com/geoarrow/geoarrow-rs/issues/588 I got a bug report about a GeoJSON file that failed to read. The issue here is a column that's numeric for the first N rows but then is
null
for some rowN + 1
. Currently the GeoJSON reader coerces anull
value to the string"null"
: https://github.com/georust/geozero/blob/3378dda305ec88cabb092d458f8a61a140f60827/geozero/src/geojson/geojson_reader.rs#L220-L221This is hard to work with because in my processor I'd have to check every input string value (from any geozero reader, for any column) against the string "null" and manage type conversions. E.g. if the first few rows of a column are the string
"null"
but then I see an f64 value, I need to be able to coerce that column to a Float64Array (which in Arrow has its own null bitmask).I'd argue that the simplest approach in this specific case is to do a no-op for GeoJSON null values. That
{"a": 1, "b": null}
is equivalent to{"a": 1}
. That seems like a much smaller change than, say, adding aColumnValue::Null
variant. But interested to hear others' thoughts.