OregonDigital / OD2

Next generation of Oregon Digital ( https://oregondigital.org ) digital collections platform, built on Samvera Hyrax ( https://github.com/samvera/hyrax/ )
19 stars 1 forks source link

Collections ranking in search results #1478

Closed sseymore closed 1 week ago

sseymore commented 3 years ago

Descriptive summary

Creating this ticket for a future discussion about how collection records are ranked in the search results.

Expected behavior

Collection records in search results will or will not have some sort of ranking logic.

Related work

N/A

Accessibility Concerns

N/A

jsimic commented 3 years ago

In addition, we can consider calling out whole collections similarly to Primo if user searches meet a threshold for strong matches to collections metadata.

sseymore commented 2 years ago

Similar to https://github.com/OregonDigital/OD2/issues/236

CGillen commented 1 year ago

https://solr.apache.org/guide/7_6/the-dismax-query-parser.html#bq-boost-query-parameter

wickr commented 1 year ago

POSM decided to start lightly boosting collection titles, and if successful, could move on to more collection metadata fields such as Description, where additional names and keywords may be present.

I was originally thinking of index-time boosting, since we know which titles come from Collections in the app as we index. But it looks like that was deprecated in Lucene. Though there is a document score field option. We could maybe add a value to a new field if it's a Collection, and indirectly boost - though that may be for the whole solr document.

Though if we do a boost as part of the query, for title and for anything that has a Collection type in the model field, perhaps that would work. https://cwiki.apache.org/confluence/display/solr/SolrRelevancyFAQ#SolrRelevancyFAQ-FieldBasedBoosting

Here's 2 examples for testing, probably need more:

Collection should probably be the 1st result instead of the 2nd.

wickr commented 1 year ago

I tried modifying an existing search results query to add in a boost but didn't have any luck: https://oregondigital.org/catalog?utf8=%E2%9C%93&search_field=all_fields&q%5B%5D=building+oregon&boost=if(termfreq(has_model_sim,%27Collection%27),5,1))

Seems like it should work, but maybe not being passed through blacklight.

Also found this to be a good explainer: https://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/