OpenPecha / Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
https://botok.readthedocs.io/
Apache License 2.0
58 stars 15 forks source link

Directory based config #71

Closed 10zinten closed 4 years ago

10zinten commented 4 years ago

Questions

ngawangtrinley commented 4 years ago

Lexica was replaced by words

Particles was in main I think

10zinten commented 4 years ago

@ngawangtrinley I have put all the dialect packs in botok-data and released each dialect like that

Then we just have to pass the dialect name like so https://github.com/Esukhia/botok/blob/709b6c4a78b5a67fb60148a0c31ad94c88b25e80/tests/test_config.py#L38 and config will automatically download the latest release of the dialect pack if not downloaded and initialize the config.

Config can also be created directory path of the dialect pack https://github.com/Esukhia/botok/blob/5d217106403c99e5b9ff039d87e5693b8ee574b2/tests/test_config.py#L54

How is it ?