SEMICeu / style-guide

SEMIC style guide to create reusable vocabularies and application profiles
https://semiceu.github.io/style-guide/
Creative Commons Attribution 4.0 International
9 stars 2 forks source link

Refine the "open and closed world assumption" recommendation #53

Open tfrancart opened 1 year ago

tfrancart commented 1 year ago

Regarding the last paragraph of https://semiceu.github.io/style-guide/public-review/gc-data-shape-conventions.html#sec:dsc-r3 :

This may have an impact, especially for larger vocabularies (such as the eProcurement ontology), on how the data shapes are organised. As data shapes may be used to suggest how the data may be fragmented and how it shall not.

I would welcome more details on this topic. Could the style guide elaborate on how the eProcurement ontology organizes its shapes ? The paragraph is not explicit on this. How concretely can data shapes be used to suggest how the data may be fragmented ?

We are often seeing situations where it is necessary to design two levels of shapes:

  1. Shapes to validate/describe single datasources, where each datasource holds a part of the data
  2. Shapes encoding the complete application profile, once all datasources have been merged

Does the style guide offer any suggestion on how these two levels can/should be articulated ? can this be a single SHACL file, with extensions of the SHACL file with certain constraints deactivated ? (using sh:deactivated) ? should these 2 levels be maintained separately ?

costezki commented 1 year ago

This is an excellent separation of concerns, Thomas. Thank you for pointing this out and we will take it onboard.

Point 1 describes already a technical concern rather than a purely semantic one described in point 2. We do not yet cover the technical interoperability. We only provide some hints, for the moment, on how to distinguish between them. Stay tuned for more, on technical interoperability, in the next versions of the style guide.

See: https://semiceu.github.io/style-guide/public-review/arhitectural-clarifications.html#sec:technical-concerns-and-artefacts

tfrancart commented 1 year ago

We are actually facing a situation (at European Parliament) where we have 3 levels of shapes:

  1. Shapes to validate/describe single datasources, where each datasource holds a part of the data (these are not public, but we use them to validate our data migration processes)
  2. Shapes encoding the complete application profile, once all datasources have been merged (the ones at https://github.com/europarl/eli-ep or https://github.com/europarl/org-ep)
  3. Shapes describing the structure of datasets being published (the ones at https://github.com/europarl/open-data-beta-testing/tree/main/data-structure - the datasets are the ones published at https://data.europarl.europa.eu/fr/datasets?language=en&order=RELEVANCE)
tfrancart commented 1 year ago

(and we are simply defining 3 separate SHACL files, no connections between them, no reuse)