gbif / rs.gbif.org

GBIF machine-readable resources
https://rs.gbif.org
11 stars 13 forks source link

Create Frictionless Data publishing model for Material oriented data #106

Closed timrobertson100 closed 1 year ago

timrobertson100 commented 1 year ago

Create a draft schema (i.e. publishing model) for sharing Material records based on the learnings from the data modeling exercise. This will be greatly simplified version of the schema used in that exercise. The first draft will be shared with the group and tested with some data at the SPNCH workshop.

What follows will be refined, but based on a first discussion with @tucotuco

The early sketch of this included four core tables:

Additional tables (similar to extensions) would be available to cover aspects of Sequences, Material Citations, Agents involved, Assertions (Measurements and facts), Identifiers and the ability to capture Relationships and Activities that are not covered by the core arrangement (i.e., beyond simply the gathering and identification events).

What is omitted in the simplified model relates primarily to occurrences and organisms, which can generally be inferred from the evidence. These were a source of confusion for some. Another improvement is the tightening of some key relationships, which, which, being too open before, were an obstacle to clarity. This schema will be used to test with real data to help understand if this simplification captures the majority of needs and is understandable.

timrobertson100 commented 1 year ago

Note: creating a skeleton structure of this quickly is needed (even if only partially complete) to allow work on the IPT3 branch to progress.

timrobertson100 commented 1 year ago

I've created a draft illustration here for the skeleton of the main tables required for this.

@mike-podolskiy90 could you please help us out? If you could convert the 5 tables at the top of the diagram, along with their relationships (i.e. foreign keys) into files formatted for a material-dp in the sandbox/experimental repository that would be a great help. We can then add the remaining tables and the rest of the fields. Those will then form the schemas for the IPT to support a Material Data Package publishing format.

mike-podolskiy90 commented 1 year ago

Done. https://rs.gbif.org/sandbox/experimental/material-dp/0.1/

timrobertson100 commented 1 year ago

Thanks @mike-podolskiy90!

tucotuco commented 1 year ago

I committed an example for Agent as a talking point. Can we structure the terms t capture their definitions in the same way as in Darwin Core - with definitions separate from usage comments separate from examples? In this commit I included examples in the definition.

timrobertson100 commented 1 year ago

I think we're limited to labels, definitions, and examples (i.e. no usage comments).

The example from the Frictionless Table Schema Descriptor spec reads as:

  "fields": [
    // a field-descriptor
    {
      "name": "name of field (e.g. column name)",
      "title": "A nicer human readable label or title for the field",
      "type": "A string specifying the type",
      "format": "A string specifying a format",
      "example": "An example value for the field",
      "description": "A description for the field"
      ...
    }

Please shout if the IPT doesn't accept that @mike-podolskiy90 (this is @tucotuco commit)

Edited to add: With that said, I see Peter added skos:broadMatch in the camtrap-dp schemas so I think we could add "usageComments" even though it isn't used by FD - but we could get the IPT to use it

tucotuco commented 1 year ago

First draft with all properties for five core tables committed here.

peterdesmet commented 1 year ago

Correct link to commit: https://github.com/gbif/rs.gbif.org/commit/f8221541053e5f430492c93af43c939230b74c83

Yes, it is possible to extend with your own properties. For Camtrap DP I decided to include usageComments with the description, so the help text in the IPT only needs to pull from one property.

timrobertson100 commented 1 year ago

In discussion with @MortenHofft, came 2 suggestions:

  1. Change parentMaterialEntityID to materialEntityID as in most cases it's simply a photo of a specimen
  2. Add materialType to MaterialEntity as a String for now, knowing vocabularies are being discussed
tucotuco commented 1 year ago

Addressed in b6ec0d8ac519a298e7d62d2cb45412dc3604cb18. Note that materialEntityType already exists and is meant to cover the cases that materialType would cover.

tucotuco commented 1 year ago

New version committed (28f90a4c6517d778ab9b78f8cce0ab901ed20f27) and believed ready for testing.

tucotuco commented 1 year ago

Third version committed (see pull request https://github.com/gbif/rs.gbif.org/pull/109), which includes Assertions, Identifiers, Citations, and pkey constraints.

timrobertson100 commented 1 year ago

This is now installed in https://ipt3.gbif-uat.org/ with preliminary tests done. Closing so we can track any problems in dedicated issues. Thanks @tucotuco and @mike-podolskiy90