ckan / ckanext-dcat

CKAN ♥ DCAT
https://docs.ckan.org/projects/ckanext-dcat
168 stars 148 forks source link

Multilingual support in DCAT profiles #318

Closed amercader closed 3 weeks ago

amercader commented 4 weeks ago

This builds on excellent code started by @stefina and @JVickery-TBS in #124 and #240 respectively, but adapting it to the current profiles and generalizing it for maximum compatibility.

Multilingual support is provided via integration with ckanext-fluent, the supported way of implementing translations for CKAN fields.

At the serialization level, a new triple will be added for each of the defined languages (if the translation is present):

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.org/dataset/0112cf32-bce0-4071-9504-923375f9f2ad> a dcat:Dataset ;
    dct:title "Conjunt de dades de prova DCAT"@ca,
        "Test DCAT dataset"@en,
        "Conjunto de datos de prueba DCAT"@es ;
    dct:description "Una descripció qualsevol"@ca,
        "Some description"@en,
        "Una descripción cualquiera"@es ;
    dct:language "ca",
        "en",
        "es" ;
    dct:provenance [ a dct:ProvenanceStatement ;
        rdfs:label "Una declaració sobre la procedència"@ca,
            "Statement about provenance"@en,
            "Una declaración sobre la procedencia"@es ] ;

When parsing, the parsers will import properties from DCAT serializations in the expected format if the field is defined as fluent in the schema:

{
    "name": "test-dataset",
    "provenance": {
        "en": "Statement about provenance",
        "ca": "Una declaració sobre la procedència",
        "es": "Una declaración sobre la procedencia"
    }
}

As implemented in #124, if one of the languages is missing in the DCAT serialization, an empty string will be returned for that language. Also if the DCAT serialization does not define the language used, the default CKAN language will be used (ckan.locale_default).

@JVickery-TBS this covers most of your changes in #240 except for the handling of translated fields in publishers / organizations. As it's difficult to come up with a logic that works in the many different scenarios, this is best suited in a small custom profile. But let me know if I missed anything else besides this issue.

cc @seitenbau-govdata