Open teckart opened 3 years ago
The Solr collapsing mechanism provides min/max/sort parameters to select a group's head. We could create an (optional) index field to indicate the preference of a specific resource based on its origin and use it in the query, but it is still unclear what information we would use for that. We could for example maintain a list of endpoints that are mostly "aggregators" (of external resources) for downvoting, but this would mean additional configuration & maintenance and would be a bit random in some cases (like LINDAT's "LRT inventory"). This might also be the case when prefering a dataProvider over others.
Something to keep in mind: we already have boosts in place for things like availability, presence of description, position in hierarchy (see solrconfig.xml) that now help determine the group's head. By default the selection takes into account relevance with respect to the query as well.
We will have to carefully decide whether we want to add logic 'on top' of this, or have a completely separate policy for the selection of the head. I don't have a clear preference right now but we have to make sure that we don't inadvertently discard a useful ranking mechanism.
In cases where the VLO importer identifies record duplicates (currently based on name and language), the record presented on the search page might not be the one from the resource owner, but another record provided by an external catalogue. Ways to reduce this behavour have to be evaluated and implemented.
Example: "Arabic Speech Corpus" OTA vs. ELRA
Helpful links: