issues
search
Hk669
/
bpetokenizer
(py package) train your own tokenizer based on BPE algorithm for the LLMs (supports the regex pattern and special tokens)
https://pypi.org/project/bpetokenizer/
2
stars
1
forks
source link
Deprecate the save/load mode= "file" for the tokenizer.
#4
Closed
Hk669
closed
3 weeks ago
Hk669
commented
4 weeks ago
i think json is more efficient than the file mode.
started saving the tokenizer using the json, now file doesn't make sense :)