issues
search
alasdairforsythe
/
tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
MIT License
551
stars
21
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix arraybuffer error in Node.js > 18
#37
Vectorrent
opened
4 months ago
0
Preferred citation `bibtex`
#35
konstantinjdobler
opened
6 months ago
0
Hangs with PyTorch data loaders when `num_workers > 0`
#34
ntoxeg
opened
8 months ago
0
"vocab.load_multiprocess_safe" doesn't work while multi-processing.
#33
ElleLeonne
closed
8 months ago
1
Update on multilingual
#32
kerighan
opened
9 months ago
2
fix: replaced the wrong file mentioned in traning/README.md
#31
vovw
closed
1 month ago
0
Inquiry on Extending Algorithm to Other Languages
#30
dsdanielpark
opened
11 months ago
2
Special tokens not showing up correctly when tokenized.
#29
amazingvince
opened
1 year ago
1
Update tokenmonster.py
#28
amazingvince
opened
1 year ago
0
Humble question regarding JS performance
#27
worstpractice
closed
1 year ago
1
Implemented in the new AI framework Zeta
#26
kyegomez
closed
1 year ago
1
What is the difference between `50256-consistent-oneword` and `50256-consistent`?
#24
Calvinnncy97
opened
1 year ago
1
code-65536 models cannot decode
#23
gautierdag
closed
1 year ago
1
RuntimeError: tokenmonsterserver: Cannot open or save vocabulary file, please check permissions
#21
abedkhooli
closed
1 year ago
3
Idea: Wouldn't it be possible for Tokenmonster to stop when it reaches the idea vocab size?
#20
Calvinnncy97
closed
1 year ago
2
hello!
#19
Alignment-Lab-AI
closed
1 year ago
1
C implementation
#17
abb128
opened
1 year ago
1
Meaning of C and D
#15
Maxscha
opened
1 year ago
1
Wrapping lib in a go cli client
#14
101313
closed
1 year ago
2
charset bug fix
#12
codinglover0111
closed
1 year ago
1
panic: assignment to entry in nil map
#11
botsbreeder
closed
1 year ago
1
Spacecode: extend Capcode idea to composite words
#10
kosiakk
closed
1 year ago
11
Tokenize strings of only N-types of characters?
#8
ianderrington
closed
1 year ago
4
Continuous training: Deleted 0 of 0 tokens; Remaining 0 tokens; reachedMidway withinVocabX2 reachedVocab
#7
ianderrington
closed
1 year ago
1
Question/issue about uppercase
#6
kerighan
closed
1 year ago
3
"data is required error"
#5
enpassanty
closed
1 year ago
1
HUggingface tokenizer coming soon?
#4
kyegomez
opened
1 year ago
3
Add a Python test and installation guide
#3
kerighan
closed
1 year ago
1
Change license to 0BSD
#2
JorgeCepeda
closed
1 year ago
0
This is great. Can we build a multilang tokenizer?
#1
BlinkDL
closed
1 year ago
1