OpenDataServices / flatten-tool

Tools for generating CSV and other flat versions of the structured data
http://flatten-tool.readthedocs.io/en/latest/
MIT License
105 stars 15 forks source link

Exploring support for anyOf / oneOf #182

Open timgdavies opened 6 years ago

timgdavies commented 6 years ago

We currently don't support the oneOf or anyOf constructs in JSON schema.

We are using these in the Beneficial Ownership Data Standard, and they crop up when trying to re-use other schema contents such as GeoJSON.

We should identify whether flatten-tool can support these, or whether we have to document these as unsupported, and tailor schema design accordingly.

Worked example for discussion

As a simple example, with a schema, just to start thinking on this:

{
  "properties": {
    "food": {
      "oneOf": [
        {
          "type": "object",
          "title": "3-Course Menu",
          "properties": {
            "firstCourse": {
              "type": "string"
            },
            "mainCourse": {
              "type": "string"
            },
            "desert": {
              "type": "string"
            }
          }
        },
        {
          "type": "string"
        }
      ]
    }
  }
}

The following are valid JSON objects:

{
  "food":{
    "firstCourse":"Soup",
    "mainCourse":"Nut Roast",
    "desert":"Ice cream"
  } 
}

and

{
  "food":"Cheese"
}

These would flatten into the following tables:

food
Cheese

or

food/firstCourse food/mainCourse food/desert
Soup Nut roast Ice-cream

So what should a template look like?

It would likely need to include both options:

food food/firstCourse food/mainCourse food/desert
Soup Nut roast Ice-cream
Cheese

Which would of course cause errors if the user incorrectly input:

food food/firstCourse food/mainCourse food/desert
Cheese Soup Nut roast Ice-cream

as this would mean we have a string, and then an object in the same property.

Other scenarios

The more common pattern we might encounter would be a property that can contain multiple kinds of objects (as in the case of BODS, where an interestedParty might be an entityStatement, a personStatement or a nullStatement), or an array that can contain multiple kinds of objects.

I think in these cases we could just be writing out to a template all the objects, but would need to group them somehow (possibly an editorial job when curating templates), but would the have the risk still that if a user enters properties from a mix of the potential objects, we'll end up building an invalid object. Not sure how big a problem this necessarily is, as long as validation reports can pick it up effectively.

kindly commented 6 years ago

@timgdavies the problem we also have about anyOf and oneOf are the obscure validation error messages they give. i.e they just say something like {this whole object} is not valid in any subschema and the object can be huge.

I think we can support oneOf and anyOf but only for a subset of all the possible subschemas they can contain but not sure we should. Your examples above cover having "a string or an object" or "an object and another object", but they could have many more types of subschemas. So, for example, you could have a pathological case where we have an anyOf which contains: "A list of strings or a list of objects or a different list of objects or a string or an object or a different object or a number or a list of different types". So I do not think it will be possible to cover all cases anyOf or oneOf. So we need to decide a subset if we are to support them.

My instinct is not to support them unless they do not contain types. I think it is generally bad data modelling to allow two different types under the same field name. Its hard to put into a database and hard to use as a data analyst.

I would prefer just two separate fields (with distinct names) and a way to check that only one of them are used or any of them are used (and have preference for one if both). You could use this pattern to do the validation.