ckan / ckanext-dcat

CKAN ♥ DCAT
164 stars 142 forks source link

Prefer default language values for some Literal nodes #143

Closed seitenbau-govdata closed 5 years ago

seitenbau-govdata commented 5 years ago

If a graph contains multiple languages for a node, a random node is currently used when parsing.

This introduces the option to prefer the configured default language for some nodes:

The changes are related to #51 and #124.

amercader commented 5 years ago

Thanks @seitenbau-govdata. I wonder if this behavior (picking the default language) should be the default for all fields, not just the 4 you included here (so not having to pass prefer_default_language=True as this would be the default) . What do you think? cc @metaodi

seitenbau-govdata commented 5 years ago

@amercader We thought about that as well. It would make parsing more reproducible as no more random values would be picked in case of multiple languages. But as we weren't sure about the effects of picking the default language for all Literal elemens, we only chose the attributes we needed.

In case we use it for all elements, we would drop the whole use_default_lang parameter.

metaodi commented 5 years ago

@amercader I think it would make sense to have this as the default for all fields, since the language attribute is used to determine this behavior. So I would drop the prefer_default_language parameter.

We currently use a similar mechanism in #124 to get some values as multilingual dict. But I think it's a bit different there, as we actually change the returned value (dict instead of string).

seitenbau-govdata commented 5 years ago

The language handling is used for all Literal values now.

Regarding the fallback behavior, assume two nodes exist in the graph:

  1. In case no default language is configured, always the lang=en one is picked now.
  2. If a language differing from "en" is configured as default, a random node (more precisely, the first one found) is returned like with the old logic.

What do you think about that behavior @amercader @metaodi ?

metaodi commented 5 years ago

@seitenbau-govdata I think this is fine.

Why does this only apply to Literal values? I understand that most values will be Literal's anyway, but if another type has a language attribute, the same "rule" should apply. Or am I missing something? If not, I would drop the isinstance condition.

seitenbau-govdata commented 5 years ago

@metaodi The reason is that the class Literal is the only class in rdflib at the moment which contains a property language. There are some additional checks (e.g. language value validation) and the mapping from the XML attribute langto the property language in the class Literal. So we just added the check for the type Literal.

seitenbau-govdata commented 5 years ago

@amercader and @metaodi Happy New Year! Do you have any comments about our latest comment? We would like to update ckanext-dcat in our installation to the latest version. And these are the last changes that should be included in the new version. So we are able to remove the patched files and close the issue in our repo https://github.com/GovDataOfficial/ckanext-dcatde/issues/6. 😉

metaodi commented 5 years ago

@seitenbau-govdata I still think it would be more pythonic to not use isinstance, but rather catch a possible AttributeError. But it's really just a minor thing, so I will just merge this now and maybe refactor it later.

seitenbau-govdata commented 5 years ago

@metaodi Thanks for merging it. Now it would be perfect if we could have a new version from the master. Maybe you could release a new version?

metaodi commented 5 years ago

@seitenbau-govdata I just released 0.0.9, have fun! :smile:

seitenbau-govdata commented 5 years ago

@metaodi Thanks a lot!