fiboa / specification

Field Boundaries for Agriculture (fiboa) - a specification that describes important properties of field boundaries
Apache License 2.0
9 stars 2 forks source link

How to handle required fields in merged datasets #26

Open m-mohr opened 2 months ago

m-mohr commented 2 months ago

Originally posted by @m-mohr in https://github.com/fiboa/specification/issues/13#issuecomment-2051843988

Another issue that we need to discuss: What happens when files are merged that have different extensions implemented with required fields? The current "required" implementation assumes non-nullable fields, which in case of a merge fails.

@andyjenkinson wrote:

To be honest I'd been treating multiple collections as out of scope for now because it's much more complex than the examples considered to date. After all that's what our system actually does - we merge many separate collections (we call them "sources") and serve them out through the API in multiple ways:

  1. As sets of "boundary references", each of which is a single geometry + metadata "opinion" about a boundary.
  2. As a deduplicated and merged set l, each feature being a single normalised geometry containing multiple metadata objects (one per unique reference) for all of the stuff like determinationMethod, dates, IDs etc to live.

There are many issues with the current Fiboa data model for representing the latter "multiple collections in a single file" case, since the values of most of the fields can different between collections.

Not sure what this has to do with this topic though so probably best to talk it through separately

@cholmes wrote:

for the 'required' assumption it does seem like ideally people could choose when they merge if they want the new collection to require everything. But yeah, it does get tricky, as you ideally want the schema check on the parts of the merged file that do include the extension fields.

I wrote:

Either we declare extensions on a row-level, or we could not require fields in extensions or we add an option to the CLI to not check for required fields in extensions. Or you can only merge if all requirements are met and all other fields are omitted from the resulting dataset.