Open amercader opened 8 years ago
@wardi does that sound right? Also see the TODO above, does it matter what we put in there?
@amercader We start to implement this for DCAT-AP Switzerland, I'll keep you posted. We currently use the ckanext-fluent approach.
Fantastic @metaodi! Let me know if you want me to help with some spec or discussion
Btw: here is the implementation of our multilingual DCAT-AP Switzerland profile: https://github.com/ogdch/ckanext-switzerland/blob/01652937c8f31f46d8560ab9527826a3c1523c06/ckanext/switzerland/dcat/profiles.py
Behind the scenes we use ckanext-scheming for validation/schema.
The main change to the "original" is the new parameter multilang
in the _object_value
method. We simply use this for all values where we expect multilingual values.
Note there are two ongoing PRs with initial implementations:
Right now, neither the parsers nor the serializers take multilingual metadata into account.
For instance given the following document, a random title among the three will be picked up during parsing time:
Parsing
The standard way of dealing with this seems to be to create metadata during the parsing that can be handled by ckanext-fluent when creating or updating the datasets. This essentially means storing a dict instead of a string, with the keys being the language codes:
For core fields like
title
ornotes
, we need to add an extra field suffixed with_translated
:TODO: what to put in
title
?To support it we can proabably have a variant of
_object_value
that handles the lang tags and returns a dict accordingly (RDFLib will return a different triple for each language).Serializing
Similarly, the serializing code could check the fields marked as multilingual to see if they are a string or a dict and create triples accordingly, proabably via a helper function.
Things to think about: