ckan / ckanext-dcat

CKAN ♥ DCAT
164 stars 142 forks source link

Fluent Compatibility #240

Open JVickery-TBS opened 1 year ago

JVickery-TBS commented 1 year ago

Featre/language support

Added translation support to the extensions. Translates:

smotornyuk commented 1 year ago

@amercader, can you take a look at this PR? It's a part of PR series from @JVickery-TBS with translation fixes

JVickery-TBS commented 1 year ago

@amercader okay I understand this extension a bit better from your comments thanks!

So basically, we do not want to really mess with the Profiles at all because they are just the mappings of all of the different meta formats.

So I found that the logic method dcat_dataset_show here is what is used. And to kind of go off more of your comments, I leaned into making these really just "Fluent Compatibility" because that is what it is really.

So I now check plugin_loaded('fluent').

I have reverted everything from profiles.py and the code in the dcat_dataset_show seems to handle everything well as that is where we can modify the data dict values returned from package_show before the dict is passed through the profiles/serialization.

I realize that it is not the greatest level/place to put the code, but it is clean and easy.

The only thing that I would really like to do still is figure out the landingPage for the correct language URLs. Because this is an example of the current code for a url such as http://127.0.0.1:5009/fr/dataset/d9865e61-227c-4298-a1d1-620e6669b097.xml:

<rdf:RDF>
  <dcat:Dataset rdf:about="http://127.0.0.1:5009/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title>Feeds Testing FR</dct:title>
    <dct:description>Feeds Testing Description FR</dct:description>

So it would be nice to have the landingPage be http://127.0.0.1:5009/fr/dataset/d9865e61-227c-4298-a1d1-620e6669b097

Let me know if you think there is any way to do that?

JVickery-TBS commented 1 year ago

@amercader Okay I figured it out now! Firstly, I added the fluent compatibility to the catalog view/blueprint as well now.

As for the landingPage/URL stuff, I figured that out. So it will not override any set uri field values or extra field values. As an example:

 The value will be the first found of:
        1. The value of the `uri` field
        2. The value of an extra with key `uri`
        3. `catalog_uri()` + '/dataset/' + `id` field

It is now in the catalog_uri where I check if fluent is loaded, and that the user is within a request context to replace the {{LANG}} tag in the config options ckan.site_url or ckanext.dcat.base_uri with the current language.

seitenbau-govdata commented 1 year ago

Maybe similar to https://github.com/ckan/ckanext-dcat/pull/124

amercader commented 1 year ago

Thanks @JVickery-TBS . Let's step back for a second and see what means for ckanext-dcat to have multi-language support. We will assume that for a field to be multilingual it needs to use the ckanext-fluent convention:

dataset_dict = {
    "title": {
        "en": "Some title in English",
        "ca": "Un títol en català",
        "es": "Un título en castellano",
    },
    "notes": {
        "en": "A description in English",
        "ca": "Una descripció en català",
        "es": "Una descripción en castellano",
    },
    "resources": [],
    "maintainer": "xx",

}

Multilingual support means:

  1. Supporting importing fields from the RDF graph to the fluent format above (which is the goal of #124 by @stefina)
  2. Using the fluent fields to serialize multilingual RDF files (which is the goal of this PR)

Both don't need to be done at the same time, so it's fine to focus on serialization for now. You current approach is to modify the values of multilingual fields in the serialized RDF file (the RDF/XML, jsonld, ttl...) to match the language that the current web user is using. So if they are visiting https://someckan.org/en they get:

<rdf:RDF>
  <dcat:Dataset rdf:about="https://someckan.org/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title>Some title in English</dct:title>
    <dct:description>A description in English</dct:description>

And if they are visiting https://someckan.org/ca (or ckan.locale_default = ca) they will get:

<rdf:RDF>
  <dcat:Dataset rdf:about="https://someckan.org/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title>Un títol en català</dct:title>
    <dct:description>Una descripció en català</dct:description>

This is very limited in that we are not representing all languages, and the serialization does not provide information on which is the actual language used to display the values are provided. Besides, this will only work in the context of a web request, not when used in the CLI, or as module elsewhere.

The correct and more interoperable approach is to always provide all available languages, and use language codes:

<rdf:RDF>
  <dcat:Dataset rdf:about="https://someckan.org/dataset/d9865e61-227c-4298-a1d1-620e6669b097">
    <dct:title xml:lang="en">Some title in English</dct:title>
    <dct:title xml:lang="ca">Un títol en català</dct:title>
    <dct:title xml:lang="es">Un título en castellano</dct:title>
    <dct:description xml:lang="en">A description in English</dct:description>
    <dct:description xml:lang="ca">Una descripció en català</dct:description>
    <dct:description xml:lang="es">Una descripción en castellano</dct:description>

For this I'm afraid we need to go low level, at the profiles level. But the good news is that by changing it there we will automatically get multilingual serializations regardless of how these are created (API, RDF endpoint, CLI, etc).

Below is a quick patch I tried to add support for multilingual title and notes fields. Hopefully it's easy to expand to other fields. @JVickery-TBS if you could give it a go I believe that could be a good path forward.

@seitenbau-govdata I'd love to get another pair of eyes on the modified _add_triple_from_dict() logic. I think the assumptions I made are fair but maybe we need to consider other combinations of parameters

diff --git a/ckanext/dcat/profiles.py b/ckanext/dcat/profiles.py
index 9b066ef..f91c79e 100644
--- a/ckanext/dcat/profiles.py
+++ b/ckanext/dcat/profiles.py
@@ -727,14 +727,16 @@ class RDFProfile(object):

     def _add_triples_from_dict(self, _dict, subject, items,
                                list_value=False,
-                               date_value=False):
+                               date_value=False,
+                               multilingual=False):
         for item in items:
             key, predicate, fallbacks, _type = item
             self._add_triple_from_dict(_dict, subject, predicate, key,
                                        fallbacks=fallbacks,
                                        list_value=list_value,
                                        date_value=date_value,
-                                       _type=_type)
+                                       _type=_type,
+                                       multilingual=multilingual)

     def _add_triple_from_dict(self, _dict, subject, predicate, key,
                               fallbacks=None,
@@ -742,7 +744,8 @@ class RDFProfile(object):
                               date_value=False,
                               _type=Literal,
                               _datatype=None,
-                              value_modifier=None):
+                              value_modifier=None,
+                              multilingual=False):
         '''
         Adds a new triple to the graph with the provided parameters

@@ -776,6 +779,11 @@ class RDFProfile(object):
             self._add_date_triple(subject, predicate, value, _type)
         elif value:
             # Normal text value
+            if multilingual and isinstance(value, dict):
+                # We assume that all multilingual field values are Literals
+                for lang, translated_value in value.items():
+                    object = Literal(translated_value, lang=lang)
+                    self.g.add((subject, predicate, object))
             # ensure URIRef items are preprocessed (space removal/url encoding)
             if _type == URIRef:
                 _type = CleanedURIRef
@@ -1207,10 +1215,16 @@ class EuropeanDCATAPProfile(RDFProfile):

         g.add((dataset_ref, RDF.type, DCAT.Dataset))

-        # Basic fields
+        # Multilingual fields
         items = [
             ('title', DCT.title, None, Literal),
             ('notes', DCT.description, None, Literal),
+        ]
+
+        self._add_triples_from_dict(dataset_dict, dataset_ref, items, multilingual=True)
+
+        # Basic fields
+        items = [
             ('url', DCAT.landingPage, None, URIRef),
             ('identifier', DCT.identifier, ['guid', 'id'], URIRefOrLiteral),
             ('version', OWL.versionInfo, ['dcat_version'], Literal),
JVickery-TBS commented 1 year ago

@amercader Okay here we go again hahaha.

I have done your above implementation for multilingual in the _add_triples_from_dict method. And it seems to be working nicely.

There were a couple places in which I had to do some strange-ish things:

I also removed the language from the url for the rdf:about= as that seemed incorrect?

JVickery-TBS commented 1 year ago

@amercader Just added in the fallback for the field keys. Realized that if we put in the _translated keys, we would be assuming that a user is translating all of these fields. E.g. a user could be translating the Resource Title, but not translating the Resource Description.

So just a simple check if the _translated key is in the object dicts, kind of like how the core get_translated works.

JVickery-TBS commented 1 year ago

@amercader Ian mentioned that I should rename the multilingual parameter because that is the name of the core extensions Multilingual.

So I just renamed that param to all_translated.

inderps commented 8 months ago

Will this ever get merged or not?

JVickery-TBS commented 7 months ago

@inderps Hey! Sorry! Have been busy with things over here. But have just done the feedback now, so we shall see!

amercader commented 1 month ago

Quick update here just to say that I've pulled the fluent compatibility work in the wider Scheming / DCAT 3 support effort so at some point during the next few weeks this will be looked at. I just need to think about how it will integrate with the more general scheming support but the majority of the work here should get incorporated as is. Thanks for bearing with me @JVickery-TBS