alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
MIT License
528 stars 20 forks source link

HUggingface tokenizer coming soon? #4

Open kyegomez opened 1 year ago

tiendung commented 1 year ago

I don't think it's a problem since we have python tknz

alasdairforsythe commented 1 year ago

Once I've stopping trying to improve it, I'll make a C++ implementation and make a Python implementation to wrap that, and then that I'll put on Hugging Face.