fiboa / specification

Field Boundaries for Agriculture (fiboa) - a specification that describes important properties of field boundaries
Apache License 2.0
9 stars 2 forks source link

Data types: reduce or explain better #19

Closed cholmes closed 2 months ago

cholmes commented 3 months ago

One piece of feedback that I didn't manage to give earlier was that it felt like a lot of cognitive overhead to figure out what datatype to use. There's a whole lot of options, but I'm not sure if they provide a ton of value? Like if you pick the right size then the size of the overall data will be smaller, but I'm not sure that it really matters? Most tools convert fine between the different types. I think they just assume the worst case, but that seems fine.

So was wondering if we could just have the types more map to JSON and pick reasonable conversions to Parquet, instead of having all the choice. Like it seems like it'd be nicer to me to have the meta language just have less options, instead of making people pick. And often pick wrong: https://github.com/fiboa/tillage-extension/issues/1 ;)

An alternative would likely be to just have a lot better 'guidance', explaining to people when they should pick which types. Like if you know it'll always be below between 0 and 255 then choose uint8, if it'll be between -128 and 127, etc., etc. But I'm not sure I see the utility of having so many options.

m-mohr commented 3 months ago

Most of the datatypes that you are thinking about are numerical, but just using JSON Schema datatypes means we always need to fall back to the biggest variant unless the creation tool does something based on statistics. But as we don't have that in our hand, I think we need to keep the data types. So you may need more space to keep it in memory and in storage. While storage is usually not so much of an issue, memory is usually limited. So I'd vote to keep the datatypes. But I guess we could link to an external page that describes the numerical data types and their ranges. The min/max values for most types are already included implicitly in the current page (via the GeoJSON schema). The datatypes page in general may need a bit of work to be less technical, currently it also has the mapping to Parquet and GeoJSON included.

m-mohr commented 3 months ago

I cleaned this up in https://github.com/fiboa/specification/commit/f9e0c3794763e18d9cccddd5a359795e3b1aab78 Is this better @cholmes ?

Core: https://github.com/fiboa/specification/blob/main/core/datatypes.md (this is probably the important bit for "normal" users)

The following define the encoding specifics and are usually more for the developers. GeoJSON: https://github.com/fiboa/specification/blob/main/geojson/datatypes.md GeoParquet: https://github.com/fiboa/specification/blob/main/geoparquet/datatypes.md

m-mohr commented 2 months ago

This was actually moved to the fiboa schema repo now.