alasdairforsythe tokenmonster issues

alasdairforsythe / tokenmonster

Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript

MIT License

551 stars 21 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

fix arraybuffer error in Node.js > 18

#37 Vectorrent opened 4 months ago
0
Preferred citation `bibtex`

#35 konstantinjdobler opened 6 months ago
0
Hangs with PyTorch data loaders when `num_workers > 0`

#34 ntoxeg opened 8 months ago
0
"vocab.load_multiprocess_safe" doesn't work while multi-processing.

#33 ElleLeonne closed 8 months ago
1
Update on multilingual

#32 kerighan opened 9 months ago
2
fix: replaced the wrong file mentioned in traning/README.md

#31 vovw closed 1 month ago
0
Inquiry on Extending Algorithm to Other Languages

#30 dsdanielpark opened 11 months ago
2
Special tokens not showing up correctly when tokenized.

#29 amazingvince opened 1 year ago
1
Update tokenmonster.py

#28 amazingvince opened 1 year ago
0
Humble question regarding JS performance

#27 worstpractice closed 1 year ago
1
Implemented in the new AI framework Zeta

#26 kyegomez closed 1 year ago
1
What is the difference between `50256-consistent-oneword` and `50256-consistent`?

#24 Calvinnncy97 opened 1 year ago
1
code-65536 models cannot decode

#23 gautierdag closed 1 year ago
1
RuntimeError: tokenmonsterserver: Cannot open or save vocabulary file, please check permissions

#21 abedkhooli closed 1 year ago
3
Idea: Wouldn't it be possible for Tokenmonster to stop when it reaches the idea vocab size?

#20 Calvinnncy97 closed 1 year ago
2
hello!

#19 Alignment-Lab-AI closed 1 year ago
1
C implementation

#17 abb128 opened 1 year ago
1
Meaning of C and D

#15 Maxscha opened 1 year ago
1
Wrapping lib in a go cli client

#14 101313 closed 1 year ago
2
charset bug fix

#12 codinglover0111 closed 1 year ago
1
panic: assignment to entry in nil map

#11 botsbreeder closed 1 year ago
1
Spacecode: extend Capcode idea to composite words

#10 kosiakk closed 1 year ago
11
Tokenize strings of only N-types of characters?

#8 ianderrington closed 1 year ago
4
Continuous training: Deleted 0 of 0 tokens; Remaining 0 tokens; reachedMidway withinVocabX2 reachedVocab

#7 ianderrington closed 1 year ago
1
Question/issue about uppercase

#6 kerighan closed 1 year ago
3
"data is required error"

#5 enpassanty closed 1 year ago
1
HUggingface tokenizer coming soon?

#4 kyegomez opened 1 year ago
3
Add a Python test and installation guide

#3 kerighan closed 1 year ago
1
Change license to 0BSD

#2 JorgeCepeda closed 1 year ago
0
This is great. Can we build a multilang tokenizer?

#1 BlinkDL closed 1 year ago
1