PyThaiNLP / pythainlp

Thai Natural Language Processing in Python.
https://pythainlp.org/
Apache License 2.0
936 stars 272 forks source link

Add ICU wordbreak dictionary (Thai) #877

Closed wannaphong closed 7 months ago

wannaphong commented 7 months ago

Since ICU are include to almost all web browser, so I think we should add ICU dictionary to PyThaiNLP to use same dictionary and can deploy any system that pythainlp/nlpo3 doesn't support.

Dictionary: https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/brkitr/dictionaries/thaidict.txt

pavaris-pm commented 7 months ago

@wannaphong i've added ICU of Thai language into the corpus already. You can see and review it at PR #879 krub.