Princeton-CDH / geniza

version 4.x of the Princeton Geniza Project
https://geniza.princeton.edu
Apache License 2.0
11 stars 2 forks source link

The public search bar has a bias towards LTR text that makes it challenging/unintuitive to search in RTL and especially to use wildcards with RTL. #948

Closed richmanrachel closed 2 years ago

richmanrachel commented 2 years ago

Describe the bug When trying to search the frontend of the PGP for my research, I'm struggling to input and edit RTL text. For example, I realized that when I input דלאלה (dalāla, "broker"), I was only getting 6 results and none of them included the famous woman known as Wuhsha al-Dalāla. When I tried to add the prefix "al-" (="the"), I had to retype the entire term multiple times, because when I tried to just add the prefix, it moved to the back of the word.

Then I tried adding an for a wildcard search, and I tried to put it in front of the word, but it yielded no new results. When I tried again just now to experiment, I did get 24 results when the went at the end of the word.

Is it possible for the search bar to intuit RTL text and work in the correct direction so that wildcards can be placed in the expected location?

To reproduce Steps to reproduce the behavior:

  1. Go to PGP search
  2. Type in something in RTL script.
  3. Try to edit that search
  4. See error

Expected behavior I expect that like when typing in English, if my space bar is at the end of the word, the next letters I type will show up after the space bar (and not jump to the front of the word). And vice versa, if my cursor is at the beginning of the word, my next letters will show up at the front of the word.

I also think (but could be wrong) that wildcards are supposed to go at the beginning or end of the search term depending on where you expect/want to see additional content. But the LTR bias is making the wildcard only work on the "wrong" side of the search term.

Screenshots

image image

Device information

richmanrachel commented 2 years ago

Finding another problem in this same vane as I attempted a proximity search:

image

Even when I changed my keyboard to unicode to account for smart quotes, my results are bad (seemingly pulling the number 10 as opposed to the Judaeo-Arabic words I was attempting to search)

rlskoeser commented 2 years ago

@richmanrachel thanks for this. Can you add that search string or link (or one like it) in a format that I can cut and paste so I can experiment with it? I had an idea about this problem with the search input and was thinking about it again in relation to the Startwords piece on the Zooniverse Geniza keyboards https://startwords.cdh.princeton.edu/issues/2/strangers-in-the-landscape/

I'm wondering if we might want to override / explicitly set the text direction on the input instead of letting the browser decide (but we'll have to figure out when/how). Another option is to convert the LTR versions of queries like this into the format that Solr needs to interpret them properly.

richmanrachel commented 2 years ago

@rlskoeser - yes, thank you! 10~"גזל עמל"

I'm open to options for how to solve this

rlskoeser commented 2 years ago

@richmanrachel I asked about this on DHTech Slack and got the suggestion to set the text direction to auto on the search input (currently it's inheriting the default direction from the page). Please test this change and report back how it behaves for you.

richmanrachel commented 2 years ago

@rlskoeser - this is working so much better, thank you! There are still some problems - you can't use quotations around Arabic phrases that convert to Judaeo Arabic (ex. "ثوب دبيقي" should return the 11 results for "תוב דביקי"). But the proximity search is working with Hebrew script and it's even easy to add boolean operators in Latin between Hebrew/Arabic words!

kseniaryzhova commented 2 years ago

Works! Closing! @richmanrachel is writing a new issue for Arabic quotation mark bug.