Open eroux opened 2 years ago
བར་དུ་ should be added to the vocab. I would argue that it's a frozen expression by now. We'll add instructions on how to do this in the botok docs
well, what I'll do with another POS tagger is to look at the n.rel
tag of https://web.archive.org/web/20170824153724/http://larkpie.net/tibetancorpus/tags
In a use case of phonetics I need to distinguish the sound of
བ
(ba
orwa
), but this seems currently impossible with botok:རབ་གསལ་བས
is tokenized asརབ་གསལ་ - བས
(in that caseབས
is pronouncedwé
)བྱང་ཆུབ་བར་དུ
is tokenized asབྱང་ཆུབ་ - བར་ - དུ
(in that caseབར
is pronouncedbar
)is there any way I discriminate between the two with botok (or any other tool)?