SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
72 stars 24 forks source link

for dcat:byteSize the range is `dct:decimal`: will this change to `xsd:positiveInteger`? #214

Closed sabinem closed 4 weeks ago

sabinem commented 2 years ago

We are currently improving the DCAT-AP conformance of DCAT-AP CH: in this regards I have 2 questions regarding dcat:byteSize:

We currently have it as rdfs:Literal typed as xsd:decimal, but we implemented it as rdfs:Literal typed as xsd:integer on our open data portal:

My questions are:

  1. does the implementation conform to the specification as dct:integer is a derived type from dct:decimal or do we need to change our implementation to be conformant to our own specification and also to DCAT-AP
  2. I noticed that DCAT version 3 recommends to recommend rdfs:Literal typed as xsd:nonNegativeInteger (https://www.w3.org/TR/vocab-dcat-3/#Property:distribution_size) instead of rdfs:Literal typed as xsd:decimal that it had in version 2 (https://www.w3.org/TR/vocab-dcat-2/#Property:distribution_size) Will DCAT-AP go along with this? Does it even make sense then to move away from our integer implementation?

I would appreciate to learn where DCAT-AP intends to go go with this and also the the conformance of derived types such as xsd:Integer to base types such as xds:decimal (see here on the hierarchy of xsd types: https://www.w3.org/TR/xpath-datamodel-3/#types-hierarchy)

bertvannuffelen commented 2 years ago

@sabinem this is a non-trivial issue. And the problem is more on the client side than on the "semantics side". On the semantics side, the dependency between the xsd types can be / is considered part of the specification. So using xsd:integer instead of xsd:decimal is acceptable for an implementation.

However the problem is on the client side: if the client does not include this reasoning, then it might consider this as incorrect. Take for instance our SHACL templates: unless you encode in the SHACL all compatible xsd types for xsd:decimal a SHACL validator will derive it is incorrect. As nobody can control the amount of inference a client will do, this is an unsolvable issue.

So if your profile uses a more restrictive range definition this is all fine according to me. Note that XSD allow to create your own datatypes (with min and max constraints): using these are also fine.

Today there are no expectations formulated that a client application must implement dereferencing (being on types, definitions, profiles, or data entities).

On your second question: is there an intend to align with w3c DCAT? Then the answer is positive.

But this case shows a funny opposite case. The specification is exploiting more from xsd then the developers actually do. I seldom see values been typed as xsd:nonNegativeInteger. Most implementations will internally use the integer datatype and block the negative numbers by programs. (So far I have not encountered a programming language that has as initialisation like var x = NonNegativeInteger())
In order to comply to this xsd typing requirement, implementers have to code an additional rule, namely at the type of producing the RDF representation the value must be a xsd:NonNegativeInteger otherwise not only a business error is present, but also a serialization error. This might lead to complains by the RDF parser, previously not present, becoming a major issue as the whole input will be disregarded (a parser is binary: parseable or non-parseable). So in order to resolve a parser issue, a developer has to investigate the problem. That means as long there is no person looking into it, the source cannot be harvested. As this is before SHACL validation (business-level) can happen. This is tedious & complex process. As the source of the parsing problem is not a technical problem, but a business problem that has been turned into a technical problem.

So from a "semantical perspective" the xsd:nonNegativeInteger is fine, but from the implementation perspective I would recommend not to do it, and leave the implementation of non-negativeness to the implementers.

Out of the book of having fun with harvesting: In this category of problems are also "spaces in URIs": many non-native RDF systems create URIs by concatinating values for their DCAT export. But they do not check if the produced URI is a valid one. And then the harvesting fails in a parse error.

sabinem commented 2 years ago

@bertvannuffelen Thank you so much for your detailed and thorough answer on this. It helped my a lot to better understand the issue at hand. Regarding the harvesting I agree. I have also encountered that issue with URIs before. Regarding the implementation of a NonNegativeInteger I have no experience. But thanks for your opinion that it would not be advisable to implement this. This probably means that DCAT-AP also might not turn this way any time soon, even though it would be an option for semantic reasons.

bertvannuffelen commented 1 year ago

On this issue I like to refer to an issue raised in W3C DCAT: https://github.com/w3c/dxwg/issues/1536 to be aware when using native json numbers in the REST API payloads and converting it to DCAT.