buda-base / public-digital-library

http://library.bdrc.io
5 stars 6 forks source link

Redesign etext search in Figma #909

Closed eroux closed 1 week ago

eroux commented 2 months ago

@eroux and @berger-n, the best etext search might be similar to Ctrl-F. Users would either use the browser native Ctrl-F, or our similar JS version.

Lazy load chunk size is a problem.

Could we detect Ctrl-F and start loading as much text as possible? With the JS version do the same, when widget is active, load a lot of text.

Another problem are very long texts which crash low-end devices. GPT thinks any low-end laptop should handle about 500 pages. Low-end phones maybe 200 pages. Can we guess-detect old devices with some simple rough method like viewport size, and load something like 200, 500 or 1000 pages max?

I don't think server load will be a big problem since search is used rarely. Crawlers may need to load any full text at once.

eroux commented 2 months ago

I'm not sure using the browser's Ctrl+F is going to work very well for two reasons:

Instead I propose we catch the Ctrl+F and do a very minimal UI like what's in Google Docs:

Image

note that there should be a button to open this in addition to the Ctrl+F (for it to work on mobile too). What do you think?

roopeux commented 2 months ago

One important principle is to keep users in control.

I would never forcefully capture Ctrl-F because it takes away user control. With Ctrl-F, users don't want leniency so we could keep that as a separate UX. My original question to @berger-n is still valid, about page responsiveness while text loads. Also, both Élie and me mentioned performance on low-end devices, and the guess-detect method is another question to @berger-n.

To provide leniency, we can build a search with highlighted snippets. Contexts will inevitably be shorter than with Ctrl-F, but if we allow both, users will have a choice. Search-as-you type might work when the string is longer than one syllable.

@eroux can you point me to the analyzer code that provides the leniency in Tibetan.

(GPT thinks that a low-end phone could handle around 20,000 to 50,000 characters of plain text with React without crashing, and a low-end laptop around 200,000 to 400,000 characters.)

eroux commented 2 months ago

there are different components for leniency:

https://github.com/buda-base/lucene-bo/blob/master/src/main/resources/synonyms.txt https://github.com/buda-base/lucene-bo/blob/master/src/main/java/io/bdrc/lucene/bo/TibAffixedFilter.java https://github.com/buda-base/lucene-bo/blob/master/src/main/java/io/bdrc/lucene/bo/TibCharFilter.java https://github.com/buda-base/lucene-bo/blob/master/src/main/java/io/bdrc/lucene/bo/PaBaFilter.java https://github.com/buda-base/lucene-bo/blob/master/src/main/java/io/bdrc/lucene/bo/TibPattFilter.java

the problem here is not plain text, the page we show the user is not a plain .txt file, the problem is all the rendering in ReactJS.

There is also the problem of copyrighted material, we can't allow search with Ctrl+F in that case.

I think also it will become clearer with a redesign of the etext viewer, because sometimes you want to search in the etext that you see on screen, sometimes you want to search in the whole volume, sometimes in the whole multi-volume collection... etc.

@berger-n is it difficult to render everything without waiting for scroll) on library-dev so that we see how it performs?

eroux commented 2 months ago

(also, not implemented yet but the search will handle a good list of shorthands, not very important now but it will be a very important part of search for manuscripts)

berger-n commented 2 months ago

here's a version where the whole etext is displayed: https://library-dev.bdrc.io/show/bdr:IE0OPIC2BFA6FE?startChar=0&openEtext=bdr%3AUTIE0OPIC2BFA6FE_I1KG81126 not useable at all, even on desktop... though I guess there's room for optimization, clearly this is not just loading and displaying text, we have text transliteration, buttons to handle, text hovering for Monlam, possibly images etc. so I'm not sure we can make this usable even on desktop

berger-n commented 2 months ago

longest text we have has ~2500 pages: https://library-dev.bdrc.io/show/bdr:IE0OPI7944B80B?openEtext=bdr%3AUTIE0OPI7944B80B_I1KG12053#open-viewer

eroux commented 2 months ago

yes, we can try to optimize but it's a big risk, in the sense that it can take a lot of time and may not be successful, so from a time management perspective, let's go the other route. We know the inconvenients (Ctrl+F not usable directly, but people can always download the etext and use Ctrl+F in another app) but it's manageable in a limited time and will cover all the use cases

roopeux commented 2 months ago

No need to try make the whole Khyentse Wangpo sungbum searchable by Ctrl-F 😃

I was talking within the limits of what GPT estimated for ReactJS (not txt) which is 200,000 to 400,000 characters on a low-end laptop. This would be more than enough for a text like 3x spyod 'jug. User's would not expect to search anything longer with Ctrl-F.

So the simple idea was to detect Ctrl-F, and load 200,000 chars, but only if the page stays responsive while loading. Any other optimisation would be too much.

eroux commented 2 months ago

well I don't think it makes a lot of sense... suppose you've opened a volume on BDRC and you want to search something, the encouraged way is Ctrl+F, you do it but then it can't find results at the end of the volume... to me that looks like a very deficient UX. Let's just not go the Ctrl+F route, maybe in version 3 we can have a system where it's reasonable, but it's out of reach for now.

@roopeux can you integrate a little search UI (can be pretty minimal like the Google Docs one) in the new Figmas of the etext viewer?

eroux commented 2 months ago

this is probably not the right place to report that but there is another issue with the scrolling: sometimes you open a volume and you know that what you're looking for is towards the end. Currently there's no way to access it easily, you have to spend a few mn scrolling down

roopeux commented 2 months ago

I will recreate the whole etext page in Figma with the search as requested. The UX will be between autosuggest and search with the goal of showing a comprehensive list of matches with relative ease.

Just for the record, users will try to use Ctrl-F regardless. The idea was never to make it the "encouraged" way, or a way to find anything at the end of volumes, because nobody would expect that. The idea was to make Ctrl-F work in the way users normally expect, finding words in relative proximity.

roopeux commented 2 months ago

A lenient etext search could be something like in the video below.

Links to Figma interactive prototype and the design.

https://github.com/user-attachments/assets/8dfa0f24-92d2-43bf-854e-870ada6f7c1b

roopeux commented 2 months ago

Another alternative would be a typical Find functionality with the simple Next and Previous buttons, but a lenient version of that might be both difficult to make and a bit weird for users.

eroux commented 2 months ago

I think this is really good, thanks!

Well, actually the previous / next thing wouldn't be too difficult I think, it's just getting some results from the API and then iterating through them. It would be a bit more intuitive to me but I don't know for general users... @berger-n what do you think of the relative complexity?

roopeux commented 2 months ago

@eroux, I think you are right. The UI could be something like this.

https://github.com/user-attachments/assets/8f4ffbc1-e8f7-4c9c-8c49-55eb426596d2

eroux commented 2 months ago

yes, it makes for a much more simple UI, it's a bit more appealing to me

berger-n commented 2 months ago

indeed! this one looks perfect

berger-n commented 2 months ago

some progress with the sticky navigation: https://library-dev.bdrc.io/show/bdr:IE0OPIC2BFA6FE?openEtext=bdr%3AUTIE0OPIC2BFA6FE_I1KG81127#open-viewer

simplescreenrecorder-2024-08-22_18 24 51 mkv

eroux commented 2 months ago

starting to look quite different, thanks!

roopeux commented 2 months ago

After Jann's feedback and looking more carefully what PDF readers do, here is the version 3 of the search within etext

https://github.com/user-attachments/assets/0ee9ba19-6021-413d-b5a8-6df85cc10e66

roopeux commented 2 months ago

some progress with the sticky navigation: https://library-dev.bdrc.io/show/bdr:IE0OPIC2BFA6FE?openEtext=bdr%3AUTIE0OPIC2BFA6FE_I1KG81127#open-viewer

Wonderful! Thanks @berger-n

eroux commented 2 months ago

This looks really good, thanks @roopeux !

I think this exact UX would be ideal but I'm not sure how easy it will be to have results by page in the query... it will be interesting

A few remarks on your mockup, a bit outside of the search thing:

roopeux commented 2 months ago

@eroux

I'm not sure how easy it will be to have results by page in the query.

Because we need the <em> in the full chunks, the API returns the highlight section not as snippets, but as full chunks with <em>, which can be directly inserted in the text. This leads to the "by page" solution.

To get snippets for the right panel, the quickest way is probably that FE creates them according to need. Let's have a call if you have other ideas.

roopeux commented 2 months ago

@eroux

can you redesign the bar on the left a little bit so that it has 3 levels (the collection as the root, the volumes, and then multiple etexts by volume, although this last level is optional) maybe we don't want to have both the collection and the volume in bold, otherwise it makes the position a bit ambiguous... what do you think?

A good idea. The columns are bit tighter to the left to make space for titles @berger-n

Image

roopeux commented 2 months ago

IN PROGRESS

User types in the search box Show the Prev and Next buttons

User presses Enter or clicks Prev or Next FE sends id, etext_vol or etext_instance of the child etext doc. Send also the query string and lang. API returns all chunks (pages) that match the query string with <em>. API does not create snippets. FE creates snippets and shows them as search results in the Monlam panel on the right.

User clicks Next or Prev. again FE handles this without the API. Jump to the next/prev page with hits, not to the next hit on the same page.

JannTibetan commented 2 months ago

This looks really good, thanks @roopeux !

  • can you redesign the bar on the left a little bit so that it has 3 levels (the collection as the root, the volumes, and then multiple etexts by volume, although this last level is optional)

I echo Élie's two comments here. This is a very solid mockup and I like it.

I also agree that the user needs to know exactly which segment of a given collection they are searching on (the work overall, an individual volume, or a particular outline node within a collection). Clicking on the outline could be a good way to do this, provided that it is intuitive.

I think that after a search string is entered into the search box the "next" button should change color so that the user knows to press it.

What message will be displayed if there are no search results?

I'd like to see how this looks with Tibetan script.

Small point: change Img. to Image

Image

roopeux commented 2 months ago

Thanks for your comments, @JannTibetan. I am still unsure about several details in this search UX. Especially scoped search is typically a UX risk, but the test should tell us how it will work. Our general idea is to use the placeholder texts, like "Search this volume", or "Search the website".

JannTibetan commented 2 months ago

In etexts, search string isn't appearing highlighted https://github.com/user-attachments/assets/61abeffc-2e76-4df3-a83a-08651ff27a47

eroux commented 2 months ago

Hi Jann, the etext viewer is currently quite work in progress, let's wait a few days before testing it

JannTibetan commented 2 months ago
Screenshot 2024-08-26 at 10 05 39 AM

A couple of observations/questions:

eroux commented 2 months ago

let's test again in a few days, most of this will be addressed. The scope of download is either volume or etext for now. There's currently no way to download the collection, I've had a few requests for that, maybe next week

berger-n commented 2 months ago

here's the new searchbox (though not functional yet): https://library-dev.bdrc.io/show/bdr:IE0OPI1C1BBFCB?startChar=488&openEtext=bdr%3AVLIE0OPI1C1BBFCB_I3CN4467#open-viewer

simplescreenrecorder-2024-09-02_18 45 04 mkv

now about choosing the scope for the search ie text/volume/instance, I must say it does not seem really straightforward at all to use the outline (as each title is already used as a toggle button to show/hide its descendants if any, and open the text if there is none)

instead, what about a dropdown in Text/Volume/Instance between the input box and the next/prev button? wdyt @JannTibetan @roopeux, should I give it a quick try?

eroux commented 2 months ago

Very nice, thanks!

I'll let Roope answer for the scope

roopeux commented 2 months ago

@berger-n, choosing a text/volume/instance is not separate from selecting the search scope. User will not think "I will search in some scope". They think "I have selected this text, and I expect the search happen within this text and not elsewhere". Did I understand the issue?

berger-n commented 2 months ago

@roopeux not entirely I'm afraid, I mean it requires to change the behaviour of the outline so as each node is selected when clicked, but what about showing/hiding descendants then?

eroux commented 2 months ago

good point about the behavior combination with the folding / unfolding

roopeux commented 2 months ago

Here is what I intended for outline

https://github.com/user-attachments/assets/1821a5da-df4b-4c93-922d-12b24c4812cf

eroux commented 2 months ago

that looks good thanks! Here's a suggestion from @JannTibetan : can we make the big title at the top the title of the currently selected node? (collection, volume or text)? I guess the limit of that is that this big title is not always displayed but still, it might help users understand the scope). what do you think @roopeux ?

roopeux commented 2 months ago

Oh yes, that is a great idea.

berger-n commented 2 months ago

new etext scoped search UI is there: https://library-dev.bdrc.io/show/bdr:IE0OPI11B27745 @roopeux @JannTibetan feel free to give it a try and report anything! already noticed #920, working on it oh and we should maybe think about something when the query is very simple and returns like 10000 results (the UI only show the first 1000 in that case so as not become too slow)

simplescreenrecorder-2024-09-11_17 28 14 mkv

eroux commented 2 months ago

This is really impressive, thanks!

roopeux commented 2 months ago

Great job, @berger-n ! THIS IS UPDATED Some details

eroux commented 2 months ago

@roopeux

roopeux commented 2 months ago

@berger-n @eroux

https://github.com/user-attachments/assets/afe863af-e4f7-4032-8efc-e5e33449c13d

eroux commented 2 months ago

ah yes let's get rid of the open / download buttons, good point

so, for the wrapping I'm not sure it's a good idea: often time (especially with OCR), people want to check on the image when they have a doubt with something in the etext. That's often not such an easy exercise and it requires clearly seeing the lines in the etext so that you can match them with the lines in the image, and having a sort of visual idea of where in the line a specific passage is also really helps fining it on the image. That would become much more difficult with text wrapping. Also we never had this feedback once, and in other Tibetan-related softwares I've never seen text wrapping. I think also culturally Tibetans are used to very long lines. So, I'm not opposed to trying something but it would have to be carefully designed to accomodate these considerations. I don't think this should be a priority but if you really think it will improve things please make a Figma

roopeux commented 2 months ago

Okay I'll change the wrapping thing. My problem was that search hits were outside of my screen, but I guess we can just horizontally scroll to show them.

berger-n commented 2 months ago

thanks! seems all done: https://library-dev.bdrc.io/show/bdr:IE0OPI49999A8F

simplescreenrecorder-2024-09-12_18 45 13 mkv

eroux commented 2 months ago

Nice! I have to say the horizontal auto scroll looks a bit funny (maybe it's the animation?) but it works well!

roopeux commented 2 months ago

This animation is good UX in principle, but similar things disturb me a lot at KVP. Maybe just remove it.