buda-base / lucene-bo

Lucene analyzer for Tibetan
Apache License 2.0
12 stars 3 forks source link

phonetics searching #7

Closed eroux closed 1 month ago

eroux commented 7 years ago

it would make sense to index the phonetics of word tokens, maybe in different phonetic systems. The ideal would be that gampopa finds སྒམ་པོ་པ. A reason it would be best to make that on a word level is that if we do it on a syllable level, users will need to search gam po pa, and special word pronunciations (dorje, labrang, yabshi, etc.) necessitate a word analysis, not just a syllable one.

eroux commented 1 month ago

duplicate of #48