SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
76 stars 24 forks source link

To reach DCAT-AP conformanceis it really necessary to use all the controlled vocabulary listed in section 5.2 of the DCAT AP 2.0.0 PDF version #196

Closed sabinem closed 2 years ago

sabinem commented 3 years ago

As DCAT-AP-CH we are currently striving to reach conformance with DCAT-AP, but one requirement for this seems to be the use of controlled vocabulary:

We struggle especially with the following requirements:

  1. dct:format: in Section 5.2 it is asked to use http://publications.europa.eu/resource/authority/file-type as a controlled vocabulary for this property, but we have Swiss Formats, that are not on this list: such as INTERLIS. What can we do to reach DCAT-AP conformance with this property?
  2. dct:publisher: Do we really need a controlled vocabulary for this: "In case of other types of organisations, national, regional or local vocabularies should be used" does that mean there MUST be a vocabulary and it MUST be used?
  3. dcat:theme and dcat:themeTaxanomy: we have our own theme vocabulary. So that also means that we are not conformant with this or can we just map our categories to European once and still use our own vocabulary?

Would you have any advice for us regarding the vocabulary?

jakubklimek commented 3 years ago

@sabinem Regarding 1. - you can add missing formats to http://publications.europa.eu/resource/authority/file-type using the CONTRIBUTE button on https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/file-type - It is important for interoperability that especially file types are from a common vocabulary

Regarding 3. You are free to use your own themes in addition to the EU data theme taxonomy. You can e.g. map your vocabulary to the 13 themes in the EU vocabulary. Again, important for basic interoperability to use the same controlled vocabulary for themes, at least at this rough level. E.g. in Czechia, we also use EuroVoc in addition to the data theme.

MPaunescu commented 3 years ago

@sabinem Regarding 1. Indeed it can be a suggestion to contribute or an email directly to the team in charge of EU Vocabularies at: OP-EU-VOCABULARIES@publications.europa.eu. The email is a more approachable and faster way .... and you get in direct connection with the team there.

Regarding 2. It depends on the target. The Publications Office has https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/corporate-body for this.

Regarding 3. Eurovoc will give a lot more options in the theme area indeed. On the other hand a lot more options is not always the best idea .... it can make it harder to decide. Eurovoc is a much richer taxonomy also ... but https://op.europa.eu/en/web/eu-vocabularies/concept-scheme/-/resource?uri=http://publications.europa.eu/resource/authority/data-theme is easier to deal with. Eurovoc also has the advantage that is already linked with Wikidata.

sabinem commented 3 years ago

@jakubklimek and @MPaunescu thank you both very much for your answers.

  1. I understand now how to deal with dct:format.

  2. With dcat:theme @jakubklimek do you suggest to add multilple dcat:theme statements, such as this:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix chtheme: <https://www.dcat-ap.ch/vocabulary/theme> .
@prefix eutheme: <http://publications.europa.eu/resource/dataset/data-theme> .

<https://swisstopo/catalog-endpoint.rdf>
  a dcat:Catalog ;

  dcat:themeTaxonomy chtheme:economy ;
  dcat:themeTaxonomy eutheme:economy .

Would that be okay if https://www.dcat-ap.ch/vocabulary/theme would be the URI for a custom Swiss data-theme vocabulary? Or would that be done in a different way? Or do we need to abandon the Swiss data-theme vocabulary in order to be conformant with DCAT-AP?

  1. We have a lot of Swiss data publishers. So your recommendation would be to just contribute them to the EU vocabulary the same way as we should do it with the formats? Do other countries do it that way? I know the Germans have this contributor vocabulary: https://www.dcat-ap.de/def/contributors/ Could we do it like that? Does it really needs a controlled vocabulary for the publishers to be conformant with the DCAT-AP? What would you recommend to us in that regard? I have the feeling that the Swiss publisher information is sort of more sensible data to them then the file formats.
jakubklimek commented 3 years ago

@sabinem Regarding dcat:themes - yes, that is what I suggest, see the Czech catalog.

Regarding publishers, DCAT-AP says:

The Corporate bodies NAL must be used for European institutions and a small set of international organisations. In case of 
other types of organisations, national, regional or local vocabularies should be used.

Therefore, you should have your national vocabularies for organisations. But that just means that they have IRIs and are described using RDF, e.g., see this Czech dataset.

There is no need to contribute the local publishers to the list of European ones.

sabinem commented 3 years ago

@jakubklimek Thanks for your explanation and your help with this.

So would that be enough:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<https://swisstopo/catalog-endpoint.rdf>
  a dcat:Catalog ;
  dct:publisher <https://swisstopo> .

<https://swisstopo>
  a foaf:Organization ;
  foaf:name: "Landesamt für Topographie Swisstopo"@de .

Or is there a need to gather all known publishers in a vocabulary?

jakubklimek commented 3 years ago

@sabinem Yes, this seems OK.

On a side note, your catalog IRI https://swisstopo/catalog-endpoint.rdf manifests a bad practice with explicitly indicating a representation format (.rdf). It would be better to identify your catalog with a representation format independent IRI such as https://swisstopo.ch/catalog

sabinem commented 3 years ago

@jakubklimek Thanks for pointing this out. This was just my bad example, our real catalogs don't have that. But I will also correct my example before publishing it, so that nobody will get bad ideas from it.

init-dcat-ap-de commented 3 years ago

@2: I think, the use of a CV for dct:publisher is mostly ignored.

Fun fact: dct:publisher must be used with a foaf:Agent but the entities in https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/corporate-body are not a foaf:Agent. So a SHACL rule would fail, unless you use the vocabulary AND make the used URI explicitly a foaf:Agent.

bertvannuffelen commented 3 years ago

On the publisher list: that one (the Corporate Bodies NAL) is solely applicable for European institutes. Therefore it is not applicable for national portals. It is a conditional requirement: if the publisher is known in the NAL it should be used. And this kind of constraint is rather hard to check in this way upfront.

MPaunescu commented 2 years ago

Speaking of vocabularies please do not consider this as spam, but OP will appreciate any feedback it gets on their quality. There is a campaign right now to collect opinions in that mater.

If you could give a few minutes of your time to fill in the following survey about the quality of authority tables: https://ec.europa.eu/eusurvey/runner/AuthorityTablesSurvey202109 it will be extremely valuable for the further development of those resources.

If you have any questions related with the survey or any remarks to add to it please let me know here or write to: OP-EU-VOCABULARIES@publications.europa.eu

Thank you

init-dcat-ap-de commented 2 years ago

I would like to reiterate the need to be more precise with the binding character of the controlled vocabularies. We say the listed CVs MUST be used. But then add many examples when they don't have to be used. This will be a problem, when we try to convert our specification to SHACL shapes.

dcat:theme: https://op.europa.eu/en/web/eu-vocabularies/concept-scheme/-/resource?uri=http://publications.europa.eu/resource/authority/data-theme If we say, the CV MUST be used, using some other, local URI for our additional local categories, would be in violation with that rule. What we want to say is: If dcat:theme is used, it has to be used at least once with an URI from the CV. Additional URIs are allowed.

dct:publisher: https://op.europa.eu/en/web/eu-vocabularies/concept-scheme/-/resource?uri=http://publications.europa.eu/resource/authority/corporate-body dct:publisher has as range a foaf:Agent. We say a foaf:Agent has to have a foaf:name. For shacl, this will bring us nothing but problems. The rule will either fail because we didn't use an URI from the list or because the URI from the list has no foaf:name attached and is no foaf:Agent. Or we will have to use URIs and attach everything by ourself. In addition, we are saying, that you only have to use the CV, if the publisher is in the list. Not my definition of "MUST". Possible solution: make the CV only Recommended.

dct:spatial: The EU Vocabularies Name Authority Lists must be used for continents, countries and places that are in those lists; if a particular location is not in one of the mentioned Named Authority Lists, Geonames URIs must be used.

This rule is almost impossible to enforce with shacl. Also, does this mean, that using dct:spatial with an hard coded instance of dct:Location where I give the actual spatial coverage as a dcat:bbox is not allowed? I don't think so... (Also, see: #175 ) Possible solution: make the CV only Recommended.

init-dcat-ap-de commented 2 years ago

@bertvannuffelen you added the wont-fix-tag earlier than my last response. Should I create sepereate issues for each paragraph?

bertvannuffelen commented 2 years ago

@init-dcat-ap-de

it would be better indeed to create a separate issue. So that your remark gets the proper attention and focus. I think that your 3 specific observations touch a broader topic: what are we expecting from "mandatory to use" for codelists and what kind of MS specific information should/could be harvested or not.

I think we observe that here there is a mixture of interpretations between:

And that is indeed confusion. Even independent of whether it can be validated ( in a performant way ).

The base question is thus what should a MS profile do when deciding to use another codelist. Can that value be shared? Why must it be, Why should't be? What are the additional constraints/expectations for that codelist. E.g. must there be a mapping to the suggested codelist?

Note that from a data sharing perspective sharing more information is usually not prohibited. Linked Data is actually quite resilient to it. But other formats might have issues with that. Can you create an issue "improving usage guidelines for codelists"? Then we take this a part of our future work.