search return matches for yup’ik

cwtliu commented 6 years ago

the problem: if a person searches a yupik word, they need to be perfectly matched to the yupik root in order for it to return the correct postbase. This is an issue because in the process of typing a word like 'kiputuq' (he is buying something), it won't return anything in the process because 'kipute-' is the root.

fix: If possible, it'd be great to have a return of kipute-, if someone were to type in kiputuq. in the same way, 'pissur-' should be returned from either queries '(pissur)tuq' or '(pissu)qataraa'. It seems like the best way to do this would be to try to match the keys[:-1], so then we are able to match 'kiput' and 'pissu' respectively to any search that contains postbases that might alter the last letter of the root.

tradeoffs if implementing this would cause decrease in speed of search, then it's probably not worth doing.

Temigo commented 6 years ago

I experimented with a fuzzy search library Fuse.js to replace the current one (elasticlunr.js) but the search time is much more annoying now, no matter how much I play with the different parameters. So we might as well stay with elasticlunr.js, which seems more lightweight and faster.

I believe we need to write a kind of 'stemming' function specific for Yup'ik that would be used in the search engine, e.g. in js-search library:

Stemming is the process of reducing search tokens to their root (or "stem") so that searches for different forms of a word will still yield results. For example "search", "searching" and "searched" can all be reduced to the stem "search".

We could add this stemming function in the current library, see lunr-languages. Not sure how fast we could implement this, though. Any thoughts?

Temigo commented 6 years ago

@cwtliu I just merged search branch, does it solve the issue (at least as far as we do not want to write a stemmer for Yup'ik right now) ?

Temigo / yuarcuun-web

search return matches for yup’ik #10