ODNZSL / nzsl-online

New Zealand Sign Language Dictionary
GNU General Public License v3.0
40 stars 11 forks source link

NZSL-51: Switch search from LIKE to GLOB matches to support multiple-gloss signs #1485

Closed joshmcarthur closed 1 year ago

joshmcarthur commented 1 year ago

Use the GLOB operator instead of LIKE for more capable character matching. This change allows us to identify 'words' within a gloss as those with either space or commas leading or trailing

This means we can support searching within multiple glosses, e.g. hello, salute.

GLOB is case sensitive, so we must transform the gloss and the term to lowercase. We also perform some simple escaping to avoid glob patterns being used within the search term itself.

In terms of capability, for this sort of matching, the ordering is pretty much LIKE-> GLOB -> MATCH. I'm trying to move incrementally before switching to regexp matching, for a couple of reasons:

The glob being applied (*[ ,]:term[ ,]*) is broken down as following:

In the future, we have new matching rules planned for partial matches. This is likely to change the glob to add a wildcard either trailing the term, or surrounding the term. This will still examine each 'word' in a gloss, but will allow the word to partially match, rather than exactly matching.

This change was specifically introduced to resolve searching for multiple gloss words. The test case for this is "hello, salute", but there are plenty of others, where the gloss being searched for is the first word, but other glosses are also included (typically this happens when the same sign can mean different things depending on context AFAIK).

Before this change, "Hello, salute" was not included in a search for "hello", because the word Hello was not included in the gloss:

image

After this change, "Hello, salute" is included in the search for "hello", because the string hello, salute matches the glob *[ ,]hello[ ,]*:

image