issues
search
THUDM
/
icetk
A unified tokenization tool for Images, Chinese and English.
150
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Can you offer the `sentencepiece_model.proto`?
#13
workingloong
opened
6 months ago
0
version of protobuf is too low and confilt with other python framework
#12
leotmc
opened
1 year ago
2
Please add tags to this repo corresponding to versions you had published to official pip website
#11
xuanhua
opened
1 year ago
0
protobuf version is too low
#10
danyang-rainbow
closed
1 year ago
2
Does icetk have a C++implementation version?
#9
mdztravelling
opened
1 year ago
0
Tokenizer cant be hashed when using datastes.map function
#8
dumpmemory
opened
1 year ago
2
what‘s the meaning of token 20005?
#7
xu-song
closed
1 week ago
0
Fix bug in windows
#6
xu-song
closed
1 year ago
3
Fix format
#5
chaoslawful
opened
1 year ago
0
How did the tokenizer learned?
#4
silverriver
closed
1 year ago
6
Proper way to truncate long prompts
#3
teetone
closed
2 years ago
1
appears to depend on a version of protobuf<3.19
#2
dribnet
closed
2 years ago
1
Retrieve the value of the end-of-text-token
#1
teetone
closed
2 years ago
5