Closed eroux closed 2 years ago
Both of these seem to be good solutions. The only problem is people don’t notice the facets menu. Most users are completely blind to it. As they say, we can bring the horse to water but we can’t make it drink…
indeed... we'll do the best we can!
For the etexts I'm quite sure this will have a big performance impact so... what do you think of the second option? is that feasible easily or is it a lot more work?
I think this should not be a problem, let's do that!
The only problem is people don’t notice the facets menu. Most users are completely blind to it. As they say, we can bring the horse to water but we can’t make it drink…
then maybe something at the top of results list would catch more attention?
with an icon like this maybe?
Good idea yes! Although I suspect it won't be that dramatic a change... but fortunately I know Orna wants this feature and enjoys exploring the filters so there will be at least one user!
I started the implementation on the server side and I think I have most of it except something I didn't think about: simple normalization and transliteration. For instance if in our db we have some string in Unicode, it won't be an exact match for the Wylie and vice versa. Another issue will be smart quotes and some upper casing. Although for upper case there's only so much we can do: if we lower case everything then it won't be exact match because of the retroflex, anusvara, etc. So I need to write a new function that does that for Fuseki, it will take a little bit more time than I initially anticipated (as usual one might say)
it seems ok with adding facet client-side: http://library-dev.bdrc.io/search?q=%22spyod%20%27jug%22~1&lg=bo-x-ewts&t=Instance&pg=1&f=asset,inc,tmp:catalogOnly&f=asset,inc,tmp:possibleAccess&f=hasMatch,inc,tmp:isExactMatch
thought it makes sense for it to be case insensitive: http://library-dev.bdrc.io/search?q=%22longchenpa%22&lg=en&t=Person&s=closest%20matches%20forced
also [almost] made it handle Tibetan unicode and wylie indistinctly: http://library-dev.bdrc.io/search?q=%22%E0%BD%A6%E0%BE%A4%E0%BE%B1%E0%BD%BC%E0%BD%91%E0%BC%8B%E0%BD%A0%E0%BD%87%E0%BD%B4%E0%BD%82%22~1&lg=bo&t=Instance&f=asset,inc,tmp:possibleAccess&f=asset,inc,tmp:catalogOnly
but a fix seems needed here where it does not work at all
now I'm also gonna give etexts a try (using a dedicated query if facet is checked)
Ah wonderful, quite impressive!
Let's keep it there then, if we see some performance penalties we can switch back to server side (although I think I will still need to implement the etext server side...)
Case insensitivity is not really good for Wylie (although it would be for Sanskrit and English), and since it's the main use case let's not do it. Can you just normalize the quotes from the user query (transforming everything into ascii quote)?
normalized quotes (can you check if it's what's needed?) and removed case insentivity in case of Tibetan
regarding issue with this example, it seems it comes from transliteration itself
which makes gsung 'bum/_sgam po pa
of གསུང་འབུམ། སྒམ་པོ་པ
where I would expect gsung 'bum/ sgam po pa
(no underscore) that is visible everywhere in wylie on the search results
so not sure what to do here? wdyt?
normalized quotes (can you check if it's what's needed?)
the quote normalization looks good, thanks!
The correct transliteration is with an underscore, but we normalize underscores to spaces in the UI. Let's also do that for the search if possible
fixed issue with transliteration and added widget with icon and popup: link
note that widget title changes according to current selection (not sure about the wording):
thanks, I think it looks good! I think this should be a select instead of 2 checkboxes though (in the menu above)
done: link
case of an etext: http://library-dev.bdrc.io/search?q=%22rdzogs%20pa%20chen%20po%22~1&lg=bo-x-ewts&t=Etext
looks perfect, thanks!
Was this implemented on the public site? I don't see the exact match icon in my search results.
One a related note the AND feature in searches is really helpful.
In the past people have complained about finding an author's Sungbum through the facets (that category is buried within "collections") so this is a good work around that.
yes, I don't think exact match makes a lot of sense when there's an AND so we disabled it in that case. What would be your expectation in that case?
Was this implemented on the public site? I don't see the exact match icon in my search results.
yes, I don't think exact match makes a lot of sense when there's an AND so we disabled it in that case. What would be your expectation in that case?
No expectation. I was trying out two different things at once and didn't realize that they cancel each other out.
some users are requesting a way to filter the results that match the query exactly. I have two ideas about this:
solution 1, client side
This could be done through a new facet on the left, with two options:
they wouldn't have a preview of the number of matches
if one of these facets are checked, the results should be filtered in the following way: the lucene matches (everything with a highlight marker) should have the highlight marker removed, then if the string contains the query (or is equal to the query for exact full match), then the result is kept, otherwise the result is filtered out.
solution 2, server side
Two options:
This last solution is the one with the least performance impact since we don't make any additional calculation in the general queries, so I would prefer it... In that scenario the facet on the left would still be quite special since it will change the number of results for all the other facets...
I'm not totally sure what's best... I'll toy with returning the tmp:hasExactMatch for works and version search in the general query, this shouldn't have too much of an impact on performance (even on person and place, although I suspect nobody will ask for it). @berger-n can you handle:
tmp:hasExactMatch
("exact match")tmp:isExactMatch
("exact full match")as facets so that when they are returned they are displayed correctly? For the etexts I'm quite sure this will have a big performance impact so... what do you think of the second option? is that feasible easily or is it a lot more work?