frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 107 forks source link

Combining Schemas or External Field Type templates? #894

Open khusmann opened 3 months ago

khusmann commented 3 months ago

I'm creating this thread in response to @pschumm 's comment here, so as to not pollute the original topic.

We are in the process of publishing a set of standardized table schemas on the HEAL Data Platform, each of which represents a specific, validated commonly-used measure (typically consisting of several items).

This isn't a fully-formed thought, but it strikes me that the issue of partial schemas is also related to the issue of permitting multiple (partial) table schemas per resource.

I haven't seen this elsewhere either, and I'm interested in similar functionality, so I wanted to brainstorm ways we could do this.

Borrowing @peterdesmet 's partialSchema prop, what about something like this?

{
  "partialSchema": [
    {
      "fields": [
        {
          "name": "participant_id",
          "type": "integer"
        }
      ]
    },
    "measure1.json",
    "measure2.json"
  ]
}

Where the partialSchema field, if given an array, would simply take the union of all of the schemas it was passed.

The problem with this approach is that the publisher is then stuck with the names provided by the schema. There's no way to reuse field definitions without also re-using the names of those fields.

So here's an alternative I want to propose: external field types. Basically, similar to above, but with one more layer of indirection, and the ability to "scope" the names of the included schemas:

{
  "schema": {
    "fields": [
      {
        "name": "participant_id",
         "type": "integer"
      },
      {
        "name": "question1",
        "type": "external",
        "typeRef": "measure1::item1"
      },
      {
        "name": "question2",
        "description": "This description will replace whatever the original description was in measure1::item2",
        "type": "external",
        "typeRef": "measure1::item2"
      },
      {
        "name": "question3",
        "type": "external",
        "typeRef": "measure2::item1"
      },
      {
        "name": "question4",
        "description": "This item shows how measure1::item1 can be used twice in the same resource",
        "type": "external",
        "typeRef": "measure1::item1"
      }      
    ],
    "externalFieldTypes": {
      "measure1": "measure1.json",
      "measure2": {
        "fields": [
          {
            "name": "item1",
            "description": "Description for item1 of this measure",
            "type": "integer",
            "constraints": {
              "min": 0
            }
          }
        ]
      }
    }
  }
}

Here, the new externalFieldTypes property in the schema is a map between names and referenced schemas that can have their field definitions imported into the main schema via external field types.

This way measure designers can publish definitions of their validated measures which publishers can use & link to, but can change the name of the fields (and even use the same field definition for multiple fields within a data resource).