SEMICeu / Core-Business-Vocabulary

This is the issue tracker for the maintenance of Core Business Vocabulary
17 stars 4 forks source link

"m8g" namespace extended to cover all EU Core Vocabularies? #49

Open jgmikael opened 1 month ago

jgmikael commented 1 month ago

Hi,

thank you for resolving issue #46 regarding the online accessibility of the m8g namespace. As confirmed by Ms Riitta Alkula, we can now access the vocabulary directly from our Interoperability Platform "Data Vocabularies" tool (tietomallit.suomi.fi) and create models that are directly linked to the m8g namespace classes and attributes.

However, I was personally under the assumption that the m8g namespace would cover ALL of the core vocabularies' classes and attributes - but this assumption seems to be wrong. Now the question remains: how could the Business and Person (and preferably Location) vocabularies be accessed in the same way as the m8g artefacts? There doesn't seem to be any data.europa.eu based namespace for these vocabularies - instead the classes and attributes point at namespaces that are not controlled by the SEMIC team. Actually the namespaces for the Business Vocabulary are a mess, there are references to outdated and obsolete vocabularies like the W3C Registered Organization or some other W3C based vocabularies that seem to be outdated.

Would it be at all possible to consolidate the whole EU Core Vocabularies family under the "m8g" or some other SEMIC team controlled namespace, which would enable us to make use of the whole set of vocabularies in the Data Vocabularies tool in as similar way as is now enabled for the m8g namespace?

bertvannuffelen commented 1 month ago

@jgmikael if I understand you are looking for an RDF version of the e.g. the Core Location Vocabulary (or more general the Core Vocs as you call it).

You touch in your question on the topic of Reuse. See the SEMIC StyleGuide for some cases and how to address the intended reuse in specifications.

If one decides for its usecase to use a term from another vocabulary (reusing it), e.g. Address, then the most direct and obvious way to indicate that reuse in Linked Data style is by using its URI. The SEMIC styleguide adds to this in that case one also reuses the constraints associated with the URI: i.e. the human readable semantics: definitions, labels and usage notes and the formal semantics domain, range, cardinalities, etc. Only when reusing all this reuse in full is achieved. Any other kind of reuse will lead to resuse as called in the SEMIC Style terminological or with semantical adaptations. In the last case the advice is not to reuse it by using its URI but by creating a subproperty or subclass for that term.

This is is important as RDF as syntax cannot make a distinction between a reuse in 2 different contexts. Lets consider a context a Car Vocabulary and a House Vocabulary: each defines a status.

Car Vocabulary:

adms:status a owl:ObjectProperty;
     rdfs:label "Status of the maintenance of a Car";
     rdfs:comment "The activity in which the maintenance of a car is at";
     rdfs:domain car:Car;
     rdfs:range car:Maintenance.

House Vocabulary:

adms:status a owl:ObjectProperty;
     rdfs:label "Selling Status";
     rdfs:comment "The state of selling the house is at";
     rdfs:domain house:House.

This looks fine at first sight in RDF: 2 distinct files and unambigeous defintions. However they already violate the first basics that by redefining the labels and definitions compared to adms:status this might lead to a problem. The real problems arises when merging the two vocabularies. An RDF merge is the concatination of the two files:

adms:status a owl:ObjectProperty;
     rdfs:label "Status of the maintenance of a Car";
     rdfs:comment "The activity in which the maintenance of a car is at";
     rdfs:domain car:Car;
     rdfs:range car:Maintenance.

adms:status a owl:ObjectProperty;
     rdfs:label "Selling Status";
     rdfs:comment "The state of selling the house is at";
     rdfs:domain house:House.

Rewritten, the issue becomes more visible:

adms:status a owl:ObjectProperty;
     rdfs:label "Status of the maintenance of a Car";
     rdfs:label "Selling Status";
     rdfs:comment "The activity in which the maintenance of a car is at";
     rdfs:comment "The state of selling the house is at";
     rdfs:domain car:Car;
     rdfs:domain house:House;
     rdfs:range car:Maintenance.

2 labels, 2 definitions but also the domain as been restricted to those resources that are a Car and a House. This is because of the the semantics of rdfs:domain.

The SEMIC styleguide is designed towards ensuring that such simple operations as combining two data specifications together that they lead to a minimum of extra disambiguations to be made. That means that defacto an RDF representation as specification language is somehow problematic, certainly if it wants to add restrictions on URIs that the specification does not own. (Note that this is not an issue for the data sharing itself (*)). As such only the owner of the namespace can define an RDF representation for reuse. So to avoid any misinterpretation, the namespace m8g will contain only an RDF representation of the terms defined in that namespace.

To show that RDF representation of the data is not a problem here the example:

ex:myCar adms:status ex:OilRefreshment.

The challenge is how to get to the use of adms:status the correct and intended label and definition.

bertvannuffelen commented 1 month ago

@jgmikael on the complexity of the used namespaces: that is a historic situation. This is explained in the webinar of 21 feb 2023. It is the first topic of the webinar.

All our actions have been to create a stable, as much backwards compatible outcome. And the burden is more different URIs.

jgmikael commented 1 month ago

Hi Bert and many thanks for your quick response! I'm trying to get a grip of things that are actually well beyond my skill level :-) but if I've understood your answer correctly I'm assuming the following:

My question is: couldn't the SEMIC team perhaps assemble a version of Business, Person and Location that would declare all classes and attributes locally as subclasses of the superclasses that are referred to above? This would enable us to just use one single namespace reference (like "m8g" or preferrably something like "ecv" (European Core Vocabularies)

The models we would create locally would then be like:

Class: Local Legal Entity subclass of: ecv:LegalEntity which is a subclass of legal:LegalEntity (in case this is still viable, considering that the namespace is outdated as mentioned)

I'm really threading on thin ice here due to lack of deeper knowledge in the subject, hope you can get a grip of our ideas and needs!

Mikael

P.S. A data.europa.eu vocabulary that works well in our Tietomallit tool is http://data.europa.eu/snb/model/elm/ - and they have in fact imported external vocabularies (namespaces) like "rov", "adms" and "locn" D.S:

bertvannuffelen commented 1 month ago

Hi @jgmikael, answers are inline

* There is for instance no SEMIC team defined single namespace for the EU Core Business Vocabulary (or the Person or Location)

Indeed, that is correct.

* The classes and attributes that EU Core Business is reusing (and I assume it is actually reusing basically everything) are originally defined in the namespaces declared in Chapter 5 "Terminology"

* Substance-wise the most important namespaces are: "cv", "legal", "locn" and "org"

* The class "Legal Entity", for instance is a reuse of https://www.w3.org/ns/legal-entity#LegalEntity

* It would be totally OK for us to declare these namespaces as the "source of truth" for the EU Core BUsiness data model in our Tietomallit-tool... but for instance the https://www.w3.org/ns/legal-entity#LegalEntity has bee obsolete already for years = you cannot link any longer to the original class and it's attributes

It seems that W3C has changed the propagation to the Core Business as requested. It was requested and agreed that http://www.w3.org/ns/legal#LegalEntity would lead to https://semiceu.github.io/Core-Business-Vocabulary/releases/2.1.0/#LegalEntity. We will check this.

* Same goes for locn; the http://www.w3.org/ns/locn# link leads to an outdated version of the Location vocabulary

Indeed that is correct: SEMIC took over the maintenance of it. It is agreed with W3C that these models would be maintained by them as archive and that there are forwards to SEMIC Core Vocabularies.

* The "org" reference (https://www.w3.org/ns/org#) is just as tricky; html-wise it doesn't even lead to a html-page (no problem since we're not interested in the html-versions) but the whole ontology seems outdated and left without maintenance

https://www.w3.org/TR/vocab-org/ is not handed over to SEMIC. It is a W3C Recommendation. SEMIC only uses it to create Core Business and Core Public Organisation. If you need updates to that one: you should contact W3C. Any change will imply the creation of a new WG according to W3C procedures.

My question is: couldn't the SEMIC team perhaps assemble a version of Business, Person and Location that would declare all classes and attributes locally as subclasses of the superclasses that are referred to above? This would enable us to just use one single namespace reference (like "m8g" or preferrably something like "ecv" (European Core Vocabularies)

The models we would create locally would then be like:

Class: Local Legal Entity subclass of: ecv:LegalEntity which is a subclass of legal:LegalEntity (in case this is still viable, considering that the namespace is outdated as mentioned)

These rules are exactly what the Style Guide defines. If we do a reuse-as-is (the preferred option) then we use the URI and the definitions from the reused vocabulary (stress Vocabulary and not application profile as often both are even combined - see definitions in the styleguide).

This has been the guiding principle from the start of SEMIC and most Linked Data modeling guidelines: reuse first.

The consequence is that one implicitly also assumes that the specifications in the whole ecosystem are designed and published with this in mind. Unfortunately this is not always the case, mainly because linked data specification designers think that their way of publishing is ease for the reuse by others (humans and/or machines). But if everyone takes different assumptions then this idea of automated reuse is a dream. The SEMIC style guide is a first step to bring some order in this. At least one can indicate more clearly I agree with this approach or I prefer that approach. It is a first step towards a common language for modeling semantic data specifications and how their design will affect reuse.

We know that the world (and in particular those models at W3C which correspond to our Core Vocabularies) is not perfect, but since things are published by another organisation (W3C) on which we do not have any mandate and we want to maintain backwards compatibility the situation is as it stands.

I'm really threading on thin ice here due to lack of deeper knowledge in the subject, hope you can get a grip of our ideas and needs!

Mikael

P.S. A data.europa.eu vocabulary that works well in our Tietomallit tool is http://data.europa.eu/snb/model/elm/ - and they have in fact imported external vocabularies (namespaces) like "rov", "adms" and "locn" D.S:

If one assesses the ELM vocabulary according to the SEMIC Style Guide and focus on the reuse-as-is first principle then eg. the following two cases raise questions:

As said in my first respons and illustrated in that example is that RDF as specification language is not well supporting reuse. Implementing the owl:imports as a RDF concatination is often leading to semantical chaos. One way to avoid this chaos in RDF is to force subclassing en subproperty creation (mint your own URIs). But that leads to the following data sharing challenge.

If one shares ELM data is the data graph then this

_:org1 <http://data.europa.eu/snb/model/elm/taxIdentifier>[ 
    a <http://data.europa.eu/snb/model/elm/Identifier>;
    skos:notation "VAT:1234134"
]

or this

_:org1 <http://www.w3.org/ns/legal#legalIdentifier> [ 
    a adms:Identifier;
    skos:notation "VAT:1234134"
]

That question looks for a ELM only data sharing perspective irrelant but is is in a more global perspective very important. Namely suppose one has an API that shares organisation information according to the Core Business Vocabulary, then the first means an additional mapping /API to implement, while the second not. The question each modeler has to question itself is thus: is my RDF representation that I encode the one I want to be used as the base exchange, or to implementers have to map.

This is an underlying principle in SEMIC specifications that the URIs are in the SEMIC specifications also the URIs to be used in the implementations. One could deviate and do subclassing or subproperties, but our advice in the Style Guide is that this must be done with a reason motivated by a use case, not as a general modeling practice. For that decision is it also to know if ELM is a Vocabulary or an Application Profile. In a Vocabulary terms are listed with a high reuse representation (as little as possible restrictions) while in an Application Profile, terms from many vocabularies are used to express usage constraints for the application context. So in many cases both exists and each has its own purpose and lifecycle. In case of the Core Vocabularies the namespace m8g is the vocabulary and has a different life cycle as any of the Core Vocs. It is connected, but not the same. For Vocabularies an RDF representation is required as it will contain the machine readible representation of the terms in that namespace, while for application profiles this does not exist. Today that RDF representation is replaced by a SHACL representation.

It is clear that the concatination of 2 SHACLs is different than the concatination of the RDFs as in my first respons. In this case it leads to a sound case where one would extend the other. It corresponds closer to the statement of the data must conform to specification A and specification B. (Note that in case of a vocabulary also a SHACL shape can be provided, but that will be very permissive and therefor its contribution it mostly very limited: e.g. stating that property has as domain rdfs:Resource is actually a constraint that never can be violated.

I hope this gives some inspiration on how to compare ELM and Core Business and how they are/could be related.

RiittaA commented 1 month ago

Thank you for your reply, @bertvannuffelen . I assume (correct me if I’m wrong) that @jgmikael's original question was mostly about the practical implementation of resolvable Core Vocabularies, rather than the principles. If you can publish the resolvable m8g namespace, why not publish others in the same way.

RiittaA commented 1 month ago

Our data model tool (FI-Platform) supports both core vocabularies and application profiles. They follow pretty well the principles of the SEMIC Style Guide, since core vocabularies are OWL-based ontologies and application profiles introduce constraints to ontology classes and properties in SHACL.

The problem is that if a Core Vocabulary is not resolvable, we can only refer to it indirectly by adding an annotation or a remark that “this class X is a subclass of class Y in Core Location Vocabulary”. We would like to declare a live reference to the original upper class so that a user (person or AI solution) can climb to the upper class from the data model that reuses the resources of the core vocabulary. PDF documents etc. cannot be used to make true inferences.

RiittaA commented 1 month ago

I attach a screenshot of our data model tool to illustrate, how we can (re)use resources from truly resolvable namespaces. m8g Example Add class

EmidioStani commented 1 month ago

Hello @RiittaA ,

SEMIC has authority only over data.europa.eu/m8g (core vocabularies), data.europa.eu/r5r (DCAT-AP), etc.

Core Vocabularies, like many models out there, reuse concepts and therefore their URI with their related definitions.

The fact of reusing existing URI outside SEMIC, it means that the governance has to take in consideration external entities such as W3C but also Dublin Core, FOAF, etc.

Concepts under m8g are resolvable (content negotiation is in place), the others depends on the entity responsible.

If you go under data.europa.eu/m8g you can also have all the core vocabularies in one file, is this something you need ?

RiittaA commented 1 month ago

Hello, @EmidioStani, thank you for your message.

With our data vocabulary (data modelling) tool the users can either 1) create independent, local data models or 2) create data models by linking to (re)use resolvable namespaces.

Case 1: We already have created a Finnish version of SEMIC core vocabularies that consists of all core vocabularies that were published on GitHub back in days. Please see "ISA2core - EU Core Vocabularies implementation" (https://tietomallit.suomi.fi/model/isa2core?ver=0.1.0&lang=en). So if needed, the users of our platform can use that local implementation of SEMIC core vocabularies, and create data models that align to SEMIC core vocabularies (but the relation is only implicit).

Case 2: Instead of duplicating SEMIC specifications, we would like to refer and link to live resolvable namespaces. In that case duplicates would not be needed, but linking would be explicit. We could refer to SEMIC namespaces and e.g. claim that our "person" class is a subclass of "person" class in SEMIC Core Person Vocabulary.

It is great that SEMIC now has published the resolvable m8g namespace. We just would like to have more stuff like that. :)