Closed JannTibetan closed 1 year ago
the current reasoning for the results is that we display results that make sense in the context of the etext, so if the dictionary contains "rdor sems dkar", the etext contains "rdor sems dmar" (this is totally random, sorry if it doesn't make sense in Tibetan), then if the user highlights "rdor sems" we don't display the result for "rdor sems dkar" because it's not the context of the etext. It has nothing to do with the keyword in the dictionary beginning with the queried word or not.
I'm a bit surprised because it's also quite different from what we discussed the other day... what you seem to want in this issue is more a general search in the keywords in the dictionary, not taking the etext into account. This is fine, we can have a second batch of results that are just searches of the highlighted keyword in the dictionary.
What we discussed the other day is actually having another third set of results which is the results that contain the queried word in the definition, we can also add that.
Since these two new sets of results are totally disconnected from the etext, perhaps in addition to what we already have we want a general search bar in the dictionary on the right panel? We can rethink the UI, no problem, we just have to be clear that it takes time
I'm still thinking within the context of searching only on words highlighted in an etext. I'm just saying that after highlighting a word one would have the option of displaying results that show only headwords with your search string at the begin or somewhere within the headword. This is not a request for a standalone search on the dictionary. Sorry, the examples from his app and Google Docs probably gave the wrong impression here. The last screen grab and explanation are the main point: highlight a word, launch dictionary, and then select whether you see results with a match at the beginning of the headword or within it.
ok yes, let's talk about the distinction "starts with" / "contains", which I think it not the right concept here. Do you have a specific example where you feel the results in "contains" would be different from the results we have now? (with a link to the etext)
Here is an example. The dictionary doesn't return a result for བྱང་ཆུབ་ལམ་རིམ་ (basically, short for LamRim Chenmo).
But the full dictionary on the mobile app has a record with the fuller version of the title བྱང་ཆུབ་ལམ་རིམ་ཆེན་མོ་
We want users to be able to get to the relevant headword, even if it is not exactly the same as the word appearing in the etext
sure, we can:
that's totally fine. We can even have a third category "definitions containing the highlighted word". What do you think of these titles?
Let's have all three categories! Searching within the definitions will be really helpful because I suspect the writers of the definitions used a lot of the same abbreviations that are found in our etexts.
ok! I'll have to change the API a bit but it's not too big
Wonderful, thanks. Anytime this month will do.
@berger-n I've implemented (but not put online yet) a modification of the json results for entriesForChunk
, the main difference is that there is now a type
key for all results, with the following possible values:
e
is exact matchc
is match in the etext contextk
is a match of the highlighted text in a keyword, which may or may not be in contextd
is a match of the highlighted text in a definitionfor entries of the last 2 types, the usual fields (chunk_offset_end
, nb_tokens
, etc.) are always 0
since they don't make a lot of sense. On the other hand the word
or definition
now has the usual highlight indications (↦xxx↤
). There's an example of the format in:
I think the UI could be to have 3 sections of results, with the first non-empty section open by default.
Note that the endpoint currently limits the number of results to 200, I'm not sure it makes a lot of sense to provide thousands of results in the UI, that would require some rework.
Once you feel the code is ready I can deploy ldspdi
ok thanks! I'll let you know when code is ready (probably tomorrow)
@eroux deployed a test with your data here: link
(need to be logged in and an admin user then each query to Monlam loads the same example.json see img.9 l.4 for an occurrence of རིགས for the sake of realism :smile: for further testing next week we could use ldspdi-dev, couldn't we?)
NB: not sure this is the right place for Feedbucket icon when dictionary is open... see #804
Thanks all, this looks very good. My apologies for taking so long to try it out and acknowledge your work. It's going to give the users much deeper access to the dictionary. Can we have a meeting about this on Wednesday? The three of us can share a screen and test out its performance on various multi-syllabic words and phrases to see if any additional refinements are needed.
hi @JannTibetan, new version deployed and working here: https://library-dev.bdrc.io/show/bdr:UTIE0OPIFAC9F61B_I3CN4692?backToEtext=bdr:IE0OPIFAC9F61B#open-viewer
Very cool and powerful new features! Thank you. Wednesday morning I'll schedule a time for us to share a screen and take a closer look.
good idea! don't hesitate to test on a few queries to see if some results are missing
As I try it out this morning I'll post a couple of ideas as they occur to me (sorry if they are unsystematic).
"Exact matches of the context" works really well and is super helpful. Through it people will discover that they are encountering longer phrases while fixated upon a single word.
It would be good to have a way to collapse whichever of the three lists is open:
The Tibetan analyzer at work! Super helpful.
(I like that it gives you the definitions of both སྐྱོན་ on its own and the in-context result)
Speaking of the Tibetan analyzer, can it be programmed to overlook the ནི་ topicalized particle?
Versus
This seems like a peculiar behavior.
འཁོར་བའི་ཉེས་དམིགས།, the defects of Samsara, is a common phrase. When this phrase is searched on, the dictionary displays no results:
BUT then when you search simply on ཉེས་དམིགས།, defect, then one of the entries is the full phrase འཁོར་བའི་ཉེས་དམིགས།
Re: Definitions containing the highlighted word, can we highlight the search string in a way that is more noticeable?
The dictionary return two identical definitions of this word.
FWIW the "Entries containing the highlighted text" do not contain any duplicate headwords
@JannTibetan thanks for the input, really helpful! Overlooking ནི is possible, although I anticipate there will be complaints like the one we had that རམ was impossible to find. If you're confident nobody will search ནི I can add it to the list of stop words
Just as a note, the query that should contain འཁོར་བའི་ཉེས་དམིགས is this one (I have no idea why this behavior appears yet)
and the query with two duplicate results is this one (this is easy to fix)
edit: query for the missing definition is this one
I think we might have a conflict between our analyzers and the Dictionary regarding the term ཞེས་ (used to end quotations; more of a function word). The Dictionary has a definition for it.
I think the issues with ནི་ and ཞེས་ are likely related. We can discuss over Skype.
This list of results might not be in alphabetical order. How are they ranked?
hmmm that's a very good question... I think they might be in a random order or in the order returned by Lucene... I don't think I have code to do alphabetical order in Java (and our version is probably too old for the new data we provided Unicode) but I'll investigate. A plan B is to do it in the browser directly
Sometimes, but not all the time, the "Definitions containing the highlighted word" results only give you snippets of the definition and not the full definition
VERSUS
@JannTibetan thanks a lot! looking at some of your screenshots like these it seems your browser is not using the latest version
is it on ios ? if so it seems there's this tip to force refreshing the page: https://apple.stackexchange.com/questions/74797/can-i-force-a-cache-refresh-in-safari-running-on-ios#answer-392786 (reloading with wifi off then with wifi on); if not you can check https://fabricdigital.co.nz/blog/how-to-hard-refresh-your-browser-and-clear-cache
@JannTibetan thanks a lot! looking at some of your screenshots like these it seems your browser is not using the latest version
Oh ok. Let me switch from Safari to Chrome and retry. Thanks
You are right. Now I am using Chrome and I have the ability to expand and collapse lists easily. Thanks!
it's not true indeed, we just limit the number of results to 200 in the first step of the query (a few are removed after that). Perhaps above 170 results we shouldn't display the last category or something like that?
Sure, that is a good policy. If a term is so common that it appears 250 (or even 1000) times then it's meaning should be discernible from the preceding two lists of entries.
My computer is from early 2015 (old man emoji)
I want to compliment you on this feature. I just searched on a phrase that only appears within the text of a definition and so the dictionary automatically opened up to that single, solitary relevant definition. Nice.
Sometimes, but not all the time, the "Definitions containing the highlighted word" results only give you snippets of the definition and not the full definition
@eroux it's in the data, see type:"d"
entries here
(I guess it's on purpose, to avoid returning like the full dictionary itself? :smile: )
Here is case that may be similar to the one I posted about འཁོར་བའི་ཉེས་དམིགས་
VERSUS
not really, it looks like a bug in the dictionary itself, looking on https://monlamdic.com/ there's even no entry at all...
(I guess it's on purpose, to avoid returning like the full dictionary itself? 😄 )
OK. Can we add an "expand" button to the snippets?
it's the only thing that's in the dictionary it seems, these are maybe unfinished entries
I believe that everything is in order now (including the alphabetical order, thanks to our contribution to Unicode actually!), I've just updated the server
This is a good example of the usefulness of the "Entries containing highlighted text" feature
now working from mirador as well: https://library-dev.bdrc.io/show/bdr:W3CN4690?s=%2Fshow%2Fbdr%3AMW3CN4690#open-viewer
Yes, it is. Works very nicely. Thanks
great! I think we can deploy in prod?
It works really well. Thank you!
Might it be possible to add a new feature to the Monlam interface that allows the user to select between "begins with" and "contains." The Monlam Dictionary mobile app allows for this:
Google Docs also allows for this:
The idea is that after highlighting a word and launching the dictionary, the user will have the option of selecting between "contains" and "begin with." This will not solve all the issues with variant forms not appearing in the dictionary, but it will help.
I'm sorry I didn't think of this during the testing phase but it's not too late to add it, if it would be possible.