BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
54 stars 52 forks source link

Dataset 0.5 DRAFT Comments #576

Closed susheel closed 2 years ago

susheel commented 2 years ago

Comments on the 0.5-DRAFT schema just mapping this to our HDR UK schema and DataCite

Minimum:

Recommended:

AlasdairGray commented 2 years ago

These are some really important points, particularly when thinking about the Open AIRE Research Graph use case.

AlasdairGray commented 2 years ago

In terms of the use cases we are trying to support, I think that your suggestions are one level too high, i.e. the properties you suggest for minimal should be recommended and those for recommended should be optional.

egonw commented 2 years ago

My 2 cents...

Generally, I would support encouraging DataCite support, but that alignment should be at profile level first, I think: "The Bioschemas Dataset profile is aligned with ..."

Other points:

Now, IANAL, but this insight follows from discussions with legal experts. Happy to learn more about GDPR :)

AlasdairGray commented 2 years ago

Based on the community discussions, here are the responses to your points. Note that we aim to keep the Dataset profile aligned with the Google Dataset Profile unless there are compelling domain specific consuming use cases to deviate. We also strive to keep the minimal set of properties to about 6.

Note that the Schema.org vocabulary is explicitly aligned with DCAT, and there are straightforward correspondences of many terms to DataCite.

Comments on the 0.5-DRAFT schema just mapping this to our HDR UK schema and DataCite

Minimum:

  • Should we have some notion of the publisher of the dataset - Crosswalk to DataCite and HDR UK

publisher property already included, increased to recommended marginality

  • Year of publication - Crosswalk to DataCite and HDR UK

In response to this issue, we have promoted datePublished to recommended. Note that the Google profile makes no recommendations about dates associated with a Dataset.

  • Creator should be minimum - Crosswalk to DataCite and HDR UK

creator has been left as a recommended property

Recommended:

  • GA4GH DUO code for accessRights - Link to ELIXIR and GA4GH
  • dataController (as per GDPR) if different from publisher / creator
  • jurisdiction/geolocation - GDPR

These are specific to the HDR publication use case but at this time there is no clear markup consuming use case. For now we will not make any changes but will review if a specific consuming use case arises which could include have a specialised version of the profile for datasets involving personal identifying data.