alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
MIT License
528 stars 20 forks source link

Update on multilingual #32

Open kerighan opened 4 months ago

kerighan commented 4 months ago

Is there any update on the multilingual tokenizers? The project seems to be on pause.

nampdn commented 2 months ago

You can get the binary/compile from source to train your own, I think the scope of the project is pretty good for production.

asifshaikat commented 2 months ago

hi @nampdn would please guide me how can i do that for bangla language , I am technical but newbie in core NLP domain. Help will be much appreciated