buda-base / public-digital-library

http://library.bdrc.io
4 stars 6 forks source link

Request to explore the feasibility of a new feature on Monlam Dictionary interface #826

Closed JannTibetan closed 1 year ago

JannTibetan commented 1 year ago

Might it be possible to add a new feature to the Monlam interface that allows the user to select between "begins with" and "contains." The Monlam Dictionary mobile app allows for this: IMG_0319

Google Docs also allows for this:

Screen Shot 2023-05-10 at 1 13 06 PM

The idea is that after highlighting a word and launching the dictionary, the user will have the option of selecting between "contains" and "begin with." This will not solve all the issues with variant forms not appearing in the dictionary, but it will help.

Screen Shot 2023-05-10 at 1 24 26 PM

I'm sorry I didn't think of this during the testing phase but it's not too late to add it, if it would be possible.

eroux commented 1 year ago

the current reasoning for the results is that we display results that make sense in the context of the etext, so if the dictionary contains "rdor sems dkar", the etext contains "rdor sems dmar" (this is totally random, sorry if it doesn't make sense in Tibetan), then if the user highlights "rdor sems" we don't display the result for "rdor sems dkar" because it's not the context of the etext. It has nothing to do with the keyword in the dictionary beginning with the queried word or not.

I'm a bit surprised because it's also quite different from what we discussed the other day... what you seem to want in this issue is more a general search in the keywords in the dictionary, not taking the etext into account. This is fine, we can have a second batch of results that are just searches of the highlighted keyword in the dictionary.

What we discussed the other day is actually having another third set of results which is the results that contain the queried word in the definition, we can also add that.

Since these two new sets of results are totally disconnected from the etext, perhaps in addition to what we already have we want a general search bar in the dictionary on the right panel? We can rethink the UI, no problem, we just have to be clear that it takes time

JannTibetan commented 1 year ago

I'm still thinking within the context of searching only on words highlighted in an etext. I'm just saying that after highlighting a word one would have the option of displaying results that show only headwords with your search string at the begin or somewhere within the headword. This is not a request for a standalone search on the dictionary. Sorry, the examples from his app and Google Docs probably gave the wrong impression here. The last screen grab and explanation are the main point: highlight a word, launch dictionary, and then select whether you see results with a match at the beginning of the headword or within it.

eroux commented 1 year ago

ok yes, let's talk about the distinction "starts with" / "contains", which I think it not the right concept here. Do you have a specific example where you feel the results in "contains" would be different from the results we have now? (with a link to the etext)

JannTibetan commented 1 year ago

Here is an example. The dictionary doesn't return a result for བྱང་ཆུབ་ལམ་རིམ་ (basically, short for LamRim Chenmo).

Screen Shot 2023-05-10 at 2 15 48 PM

But the full dictionary on the mobile app has a record with the fuller version of the title བྱང་ཆུབ་ལམ་རིམ་ཆེན་མོ་

IMG_0353

We want users to be able to get to the relevant headword, even if it is not exactly the same as the word appearing in the etext

eroux commented 1 year ago

sure, we can:

that's totally fine. We can even have a third category "definitions containing the highlighted word". What do you think of these titles?

JannTibetan commented 1 year ago

Let's have all three categories! Searching within the definitions will be really helpful because I suspect the writers of the definitions used a lot of the same abbreviations that are found in our etexts.

eroux commented 1 year ago

ok! I'll have to change the API a bit but it's not too big

JannTibetan commented 1 year ago

Wonderful, thanks. Anytime this month will do.

eroux commented 1 year ago

@berger-n I've implemented (but not put online yet) a modification of the json results for entriesForChunk, the main difference is that there is now a type key for all results, with the following possible values:

for entries of the last 2 types, the usual fields (chunk_offset_end, nb_tokens, etc.) are always 0 since they don't make a lot of sense. On the other hand the word or definition now has the usual highlight indications (↦xxx↤). There's an example of the format in:

example.json.zip

I think the UI could be to have 3 sections of results, with the first non-empty section open by default.

Note that the endpoint currently limits the number of results to 200, I'm not sure it makes a lot of sense to provide thousands of results in the UI, that would require some rework.

Once you feel the code is ready I can deploy ldspdi

berger-n commented 1 year ago

ok thanks! I'll let you know when code is ready (probably tomorrow)

berger-n commented 1 year ago

@eroux deployed a test with your data here: link

(need to be logged in and an admin user then each query to Monlam loads the same example.json see img.9 l.4 for an occurrence of རིགས for the sake of realism :smile: for further testing next week we could use ldspdi-dev, couldn't we?)

NB: not sure this is the right place for Feedbucket icon when dictionary is open... see #804

simplescreenrecorder-2023-05-12_18 43 32 mkv

JannTibetan commented 1 year ago

Thanks all, this looks very good. My apologies for taking so long to try it out and acknowledge your work. It's going to give the users much deeper access to the dictionary. Can we have a meeting about this on Wednesday? The three of us can share a screen and test out its performance on various multi-syllabic words and phrases to see if any additional refinements are needed.

berger-n commented 1 year ago

hi @JannTibetan, new version deployed and working here: https://library-dev.bdrc.io/show/bdr:UTIE0OPIFAC9F61B_I3CN4692?backToEtext=bdr:IE0OPIFAC9F61B#open-viewer

simplescreenrecorder-2023-05-16_17 39 22 mkv

JannTibetan commented 1 year ago

Very cool and powerful new features! Thank you. Wednesday morning I'll schedule a time for us to share a screen and take a closer look.

eroux commented 1 year ago

good idea! don't hesitate to test on a few queries to see if some results are missing

JannTibetan commented 1 year ago

As I try it out this morning I'll post a couple of ideas as they occur to me (sorry if they are unsystematic).

"Exact matches of the context" works really well and is super helpful. Through it people will discover that they are encountering longer phrases while fixated upon a single word.

It would be good to have a way to collapse whichever of the three lists is open:

Screen Shot 2023-05-17 at 8 23 16 AM
JannTibetan commented 1 year ago

The Tibetan analyzer at work! Super helpful.

Screen Shot 2023-05-17 at 8 31 41 AM

(I like that it gives you the definitions of both སྐྱོན་ on its own and the in-context result)

JannTibetan commented 1 year ago

Speaking of the Tibetan analyzer, can it be programmed to overlook the ནི་ topicalized particle?

Screen Shot 2023-05-17 at 8 38 16 AM

Versus

Screen Shot 2023-05-17 at 8 38 30 AM
JannTibetan commented 1 year ago

This seems like a peculiar behavior.

འཁོར་བའི་ཉེས་དམིགས།, the defects of Samsara, is a common phrase. When this phrase is searched on, the dictionary displays no results:

Screen Shot 2023-05-17 at 8 43 03 AM

BUT then when you search simply on ཉེས་དམིགས།, defect, then one of the entries is the full phrase འཁོར་བའི་ཉེས་དམིགས།

Screen Shot 2023-05-17 at 8 42 53 AM
JannTibetan commented 1 year ago

Re: Definitions containing the highlighted word, can we highlight the search string in a way that is more noticeable?

Screen Shot 2023-05-17 at 8 48 28 AM
JannTibetan commented 1 year ago

The dictionary return two identical definitions of this word.

Screen Shot 2023-05-17 at 8 52 06 AM

FWIW the "Entries containing the highlighted text" do not contain any duplicate headwords

eroux commented 1 year ago

@JannTibetan thanks for the input, really helpful! Overlooking ནི is possible, although I anticipate there will be complaints like the one we had that རམ was impossible to find. If you're confident nobody will search ནི I can add it to the list of stop words

Just as a note, the query that should contain འཁོར་བའི་ཉེས་དམིགས is this one (I have no idea why this behavior appears yet)

and the query with two duplicate results is this one (this is easy to fix)

edit: query for the missing definition is this one

JannTibetan commented 1 year ago

I think we might have a conflict between our analyzers and the Dictionary regarding the term ཞེས་ (used to end quotations; more of a function word). The Dictionary has a definition for it.

Screen Shot 2023-05-17 at 8 55 55 AM
JannTibetan commented 1 year ago

I think the issues with ནི་ and ཞེས་ are likely related. We can discuss over Skype.

JannTibetan commented 1 year ago

This list of results might not be in alphabetical order. How are they ranked?

Screen Shot 2023-05-17 at 9 02 51 AM
eroux commented 1 year ago

hmmm that's a very good question... I think they might be in a random order or in the order returned by Lucene... I don't think I have code to do alphabetical order in Java (and our version is probably too old for the new data we provided Unicode) but I'll investigate. A plan B is to do it in the browser directly

JannTibetan commented 1 year ago

Sometimes, but not all the time, the "Definitions containing the highlighted word" results only give you snippets of the definition and not the full definition

Screen Shot 2023-05-17 at 9 07 43 AM

VERSUS

Screen Shot 2023-05-17 at 9 10 02 AM
berger-n commented 1 year ago

@JannTibetan thanks a lot! looking at some of your screenshots like these it seems your browser is not using the latest version

image

image

is it on ios ? if so it seems there's this tip to force refreshing the page: https://apple.stackexchange.com/questions/74797/can-i-force-a-cache-refresh-in-safari-running-on-ios#answer-392786 (reloading with wifi off then with wifi on); if not you can check https://fabricdigital.co.nz/blog/how-to-hard-refresh-your-browser-and-clear-cache

JannTibetan commented 1 year ago

@JannTibetan thanks a lot! looking at some of your screenshots like these it seems your browser is not using the latest version

Oh ok. Let me switch from Safari to Chrome and retry. Thanks

JannTibetan commented 1 year ago

You are right. Now I am using Chrome and I have the ability to expand and collapse lists easily. Thanks!

eroux commented 1 year ago

it's not true indeed, we just limit the number of results to 200 in the first step of the query (a few are removed after that). Perhaps above 170 results we shouldn't display the last category or something like that?

JannTibetan commented 1 year ago

Sure, that is a good policy. If a term is so common that it appears 250 (or even 1000) times then it's meaning should be discernible from the preceding two lists of entries.

JannTibetan commented 1 year ago

My computer is from early 2015 (old man emoji)

Screen Shot 2023-05-17 at 9 26 38 AM
JannTibetan commented 1 year ago

I want to compliment you on this feature. I just searched on a phrase that only appears within the text of a definition and so the dictionary automatically opened up to that single, solitary relevant definition. Nice.

Screen Shot 2023-05-17 at 9 28 43 AM
berger-n commented 1 year ago

Sometimes, but not all the time, the "Definitions containing the highlighted word" results only give you snippets of the definition and not the full definition

@eroux it's in the data, see type:"d" entries here (I guess it's on purpose, to avoid returning like the full dictionary itself? :smile: )

JannTibetan commented 1 year ago

Here is case that may be similar to the one I posted about འཁོར་བའི་ཉེས་དམིགས་

Screen Shot 2023-05-17 at 9 31 30 AM

VERSUS

Screen Shot 2023-05-17 at 9 31 40 AM
eroux commented 1 year ago

not really, it looks like a bug in the dictionary itself, looking on https://monlamdic.com/ there's even no entry at all...

JannTibetan commented 1 year ago

(I guess it's on purpose, to avoid returning like the full dictionary itself? 😄 )

OK. Can we add an "expand" button to the snippets?

Screen Shot 2023-05-17 at 9 34 48 AM
eroux commented 1 year ago

it's the only thing that's in the dictionary it seems, these are maybe unfinished entries

eroux commented 1 year ago

I believe that everything is in order now (including the alphabetical order, thanks to our contribution to Unicode actually!), I've just updated the server

JannTibetan commented 1 year ago

This is a good example of the usefulness of the "Entries containing highlighted text" feature

Screen Shot 2023-05-17 at 12 23 51 PM
berger-n commented 1 year ago

now working from mirador as well: https://library-dev.bdrc.io/show/bdr:W3CN4690?s=%2Fshow%2Fbdr%3AMW3CN4690#open-viewer

simplescreenrecorder-2023-05-22_18 22 49 mkv

JannTibetan commented 1 year ago

Yes, it is. Works very nicely. Thanks

eroux commented 1 year ago

great! I think we can deploy in prod?

berger-n commented 1 year ago

deployed: https://library.bdrc.io/show/bdr:UTIE0OPIFAC9F61B_I3CN4692?backToEtext=bdr:MW3CN4690#open-viewer

JannTibetan commented 1 year ago

It works really well. Thank you!