OpenPecha / Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
https://botok.readthedocs.io/
Apache License 2.0
58 stars 15 forks source link

POS tags ? distinguishing some patterns #85

Open eroux opened 2 years ago

eroux commented 2 years ago

In a use case of phonetics I need to distinguish the sound of (ba or wa), but this seems currently impossible with botok:

is there any way I discriminate between the two with botok (or any other tool)?

ngawangtrinley commented 2 years ago

བར་དུ་ should be added to the vocab. I would argue that it's a frozen expression by now. We'll add instructions on how to do this in the botok docs

eroux commented 2 years ago

well, what I'll do with another POS tagger is to look at the n.rel tag of https://web.archive.org/web/20170824153724/http://larkpie.net/tibetancorpus/tags