issues
search
bojone
/
bytepiece
更纯粹、更高压缩率的Tokenizer
Apache License 2.0
442
stars
22
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
是否考虑添加断点和增量训练功能?(适配大规模语料)
#19
baisechundu
closed
3 months ago
2
convert_to_sentencepiece error : *** UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 1: unexpected end of data
#18
FlyCarrot
closed
3 months ago
2
建议作者增加一个词表扩充的demo或教程
#17
zzhdbw
opened
4 months ago
0
修正安装时UnicodeDecodeError错误
#16
jdb110
opened
5 months ago
0
tokenizer压缩率 与 模型最终效果 的关系
#15
nghuyong
opened
7 months ago
2
convert_to_sentencepiece fail
#14
donghucey
closed
7 months ago
1
转换成 sentencepiece 的之后载入失败
#13
yzlnew
opened
8 months ago
4
大数据量训练的时候卡住
#12
yzlnew
opened
8 months ago
3
安装遇到的问题并解决办法
#11
suifengdou
opened
9 months ago
1
加载model报错
#10
nuass
closed
10 months ago
2
Update README.md,示例里面tokenizer1.convert_to_sentencepiece在0.6.1版本是不工作的。
#9
eggqq007
closed
10 months ago
0
pip 安装失败
#8
wangyuxinwhy
closed
11 months ago
1
使用 huggingface tokenizers 加载 bytepiece 的模型文件
#7
wangyuxinwhy
closed
11 months ago
5
不支持指定special_token吗?
#6
zipzou
opened
11 months ago
1
fix: tiny errors caused by precision
#5
hscspring
closed
1 year ago
10
Training error by multiple works
#4
MIracleyin
closed
1 year ago
4
弱问下训练一个tokenizer的训练目标是什么?
#3
guotong1988
opened
1 year ago
1
Redundant vocab?
#2
yuanenming
closed
1 year ago
8
苏神博客咋没页面?
#1
nlp4whp
closed
1 year ago
0