IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
888 stars 490 forks source link

Search API: facets issue on translated custom fields #8287

Open bappun opened 2 years ago

bappun commented 2 years ago

What steps does it take to reproduce the issue? In a TSV file I have a controlled vocabulary with the value Politics.Elections for Topic Classification Term. This value is then translated in two languages using the java properties:

When I query the search API with facets enabled (show_facet: true), the label for this field is taken from the english translation. I get this:

"topicClassValue_ss": {
    "friendly": "Topic Classification Term",
    "labels": [
        {"Elections": 2657}
    ]
}

This becomes an issue when I try to search Dataverse using this facet. When I search topicClassValue_ss:"Elections" I get no results because the needed value for the search is the one not translated: topicClassValue_ss:"Politics.Elections". However, there is no way to get the needed value from the API.

Which version of Dataverse are you using? 5.8

Any related open or closed issues to this bug report?

8286

pdurbin commented 2 years ago

@bappun I just saw you announce https://cdsp-scpo.github.io/dataverse-feed/build/ at https://dataversecommunity.slack.com/archives/C5V66TV6Y/p1638987754055400

I expected to see 127 results when checking the box for "Elections" under "Topic Classification Term" but I got 0 items (screenshots below). Is this because of this issue you're reporting?

Screen Shot 2021-12-08 at 1 35 29 PM

Screen Shot 2021-12-08 at 1 35 34 PM

bappun commented 2 years ago

@pdurbin Yes! I noticed the issue while working on this prototype. The online demo is using Dataverse v4.20 but I also tried with a pre-production instance using v5.8 and noticed the same problem on my implementation.

I would like to reproduce the same behavior as in the Dataverse UI where selecting the Elections facet adds the Politics.Elections filter: image

qqmyers commented 2 years ago

FWIW: My guess is that it is just a bug that the filter shown above the results uses the base term. For the external vocab mechanism, both the facets and the filter are translated in the UI - I think as requested in review.

I'll also note that the issue here makes it hard to do a simple search for CVV as well, i.e., if the translation of Politics.Elections was 'Voting' (anything that didn't have the words 'politics' or 'elections' in it) simple search for the term visible on the page wouldn't get any results either. So, it isn't just an API issue.

After some discussion, it sounds like indexing the CVV values for all configured languages could be a reasonable way to solve this. (I think this can be done so the facets aren't affected but filtering for the base term or any translation would get a hit.) Unless there are concerns/somebody can see a problem with this approach, I'll look into it on Sciences PO's behalf.

pdurbin commented 2 years ago

it sounds like indexing the CVV values for all configured languages could be a reasonable way to solve this

Sure, I think that approach is worth exploring, at least.