buda-base / public-digital-library

http://library.bdrc.io
4 stars 6 forks source link

exact match feature #718

Closed eroux closed 2 years ago

eroux commented 2 years ago

some users are requesting a way to filter the results that match the query exactly. I have two ideas about this:

solution 1, client side

This could be done through a new facet on the left, with two options:

they wouldn't have a preview of the number of matches

if one of these facets are checked, the results should be filtered in the following way: the lucene matches (everything with a highlight marker) should have the highlight marker removed, then if the string contains the query (or is equal to the query for exact full match), then the result is kept, otherwise the result is filtered out.

solution 2, server side

Two options:

This last solution is the one with the least performance impact since we don't make any additional calculation in the general queries, so I would prefer it... In that scenario the facet on the left would still be quite special since it will change the number of results for all the other facets...

I'm not totally sure what's best... I'll toy with returning the tmp:hasExactMatch for works and version search in the general query, this shouldn't have too much of an impact on performance (even on person and place, although I suspect nobody will ask for it). @berger-n can you handle:

as facets so that when they are returned they are displayed correctly? For the etexts I'm quite sure this will have a big performance impact so... what do you think of the second option? is that feasible easily or is it a lot more work?

JannTibetan commented 2 years ago

Both of these seem to be good solutions. The only problem is people don’t notice the facets menu. Most users are completely blind to it. As they say, we can bring the horse to water but we can’t make it drink…

eroux commented 2 years ago

indeed... we'll do the best we can!

berger-n commented 2 years ago

For the etexts I'm quite sure this will have a big performance impact so... what do you think of the second option? is that feasible easily or is it a lot more work?

I think this should not be a problem, let's do that!

berger-n commented 2 years ago

The only problem is people don’t notice the facets menu. Most users are completely blind to it. As they say, we can bring the horse to water but we can’t make it drink…

then maybe something at the top of results list would catch more attention?

image

berger-n commented 2 years ago

with an icon like this maybe?

eroux commented 2 years ago

Good idea yes! Although I suspect it won't be that dramatic a change... but fortunately I know Orna wants this feature and enjoys exploring the filters so there will be at least one user!

I started the implementation on the server side and I think I have most of it except something I didn't think about: simple normalization and transliteration. For instance if in our db we have some string in Unicode, it won't be an exact match for the Wylie and vice versa. Another issue will be smart quotes and some upper casing. Although for upper case there's only so much we can do: if we lower case everything then it won't be exact match because of the retroflex, anusvara, etc. So I need to write a new function that does that for Fuseki, it will take a little bit more time than I initially anticipated (as usual one might say)

berger-n commented 2 years ago

it seems ok with adding facet client-side: http://library-dev.bdrc.io/search?q=%22spyod%20%27jug%22~1&lg=bo-x-ewts&t=Instance&pg=1&f=asset,inc,tmp:catalogOnly&f=asset,inc,tmp:possibleAccess&f=hasMatch,inc,tmp:isExactMatch

image

thought it makes sense for it to be case insensitive: http://library-dev.bdrc.io/search?q=%22longchenpa%22&lg=en&t=Person&s=closest%20matches%20forced

image

also [almost] made it handle Tibetan unicode and wylie indistinctly: http://library-dev.bdrc.io/search?q=%22%E0%BD%A6%E0%BE%A4%E0%BE%B1%E0%BD%BC%E0%BD%91%E0%BC%8B%E0%BD%A0%E0%BD%87%E0%BD%B4%E0%BD%82%22~1&lg=bo&t=Instance&f=asset,inc,tmp:possibleAccess&f=asset,inc,tmp:catalogOnly

image

but a fix seems needed here where it does not work at all

now I'm also gonna give etexts a try (using a dedicated query if facet is checked)

eroux commented 2 years ago

Ah wonderful, quite impressive!

Let's keep it there then, if we see some performance penalties we can switch back to server side (although I think I will still need to implement the etext server side...)

Case insensitivity is not really good for Wylie (although it would be for Sanskrit and English), and since it's the main use case let's not do it. Can you just normalize the quotes from the user query (transforming everything into ascii quote)?

berger-n commented 2 years ago

normalized quotes (can you check if it's what's needed?) and removed case insentivity in case of Tibetan

regarding issue with this example, it seems it comes from transliteration itself which makes gsung 'bum/_sgam po pa of གསུང་འབུམ། སྒམ་པོ་པ where I would expect gsung 'bum/ sgam po pa (no underscore) that is visible everywhere in wylie on the search results so not sure what to do here? wdyt?

berger-n commented 2 years ago

normalized quotes (can you check if it's what's needed?)

here: https://github.com/buda-base/public-digital-library/blob/07af78c2213935ecb186578a455234456e3504ee/src/state/sagas/index.js#L1992

eroux commented 2 years ago

the quote normalization looks good, thanks!

The correct transliteration is with an underscore, but we normalize underscores to spaces in the UI. Let's also do that for the search if possible

berger-n commented 2 years ago

fixed issue with transliteration and added widget with icon and popup: link

image

note that widget title changes according to current selection (not sure about the wording):

https://github.com/buda-base/public-digital-library/blob/91c1745d894b0a7bf276fea7df0fac436004a21b/src/translations/en.json#L158-L161

eroux commented 2 years ago

thanks, I think it looks good! I think this should be a select instead of 2 checkboxes though (in the menu above)

berger-n commented 2 years ago

done: link

image

case of an etext: http://library-dev.bdrc.io/search?q=%22rdzogs%20pa%20chen%20po%22~1&lg=bo-x-ewts&t=Etext

image

eroux commented 2 years ago

looks perfect, thanks!

JannTibetan commented 2 years ago

Was this implemented on the public site? I don't see the exact match icon in my search results.

One a related note the AND feature in searches is really helpful.

Screen Shot 2022-08-05 at 10 16 25 AM

In the past people have complained about finding an author's Sungbum through the facets (that category is buried within "collections") so this is a good work around that.

eroux commented 2 years ago

yes, I don't think exact match makes a lot of sense when there's an AND so we disabled it in that case. What would be your expectation in that case?

berger-n commented 2 years ago

Was this implemented on the public site? I don't see the exact match icon in my search results.

fixed: https://library.bdrc.io/search?q=%22%E0%BD%82%E0%BD%A6%E0%BD%B4%E0%BD%84%E0%BC%8B%E0%BD%A0%E0%BD%96%E0%BD%B4%E0%BD%98%E0%BC%8D%20%E0%BD%A6%E0%BE%92%E0%BD%98%E0%BC%8B%E0%BD%94%E0%BD%BC%E0%BC%8B%E0%BD%94%E0%BC%8D%22~1&lg=bo&t=Instance&pg=1&f=hasMatch,inc,tmp:isExactMatch

image

JannTibetan commented 2 years ago

yes, I don't think exact match makes a lot of sense when there's an AND so we disabled it in that case. What would be your expectation in that case?

No expectation. I was trying out two different things at once and didn't realize that they cancel each other out.

berger-n commented 1 year ago

image

image