Hk669 / bpetokenizer

(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
https://pypi.org/project/bpetokenizer/
2 stars 1 forks source link

use go for multithreading<> increase the performance #16

Open Hk669 opened 3 weeks ago

Hk669 commented 3 weeks ago

Is your feature request related to a problem? Please describe.

i tried using go, it seems 40% faster than the python. i think writing more efficient code on go with multithreading will optimize to fullness of the training of the tokenizer.

Describe the solution you'd like

replicate the bpetokenizer with go

zacharias1219 commented 2 weeks ago

You started working on this?? Also there weren't any errors on my local machine while using bpetokenizer.

Hk669 commented 2 weeks ago

You started working on this?? Also there weren't any errors on my local machine while using bpetokenizer.

Yeah started already.