issues
search
Haiyang-W
/
TokenFormer
Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
https://haiyang-w.github.io/tokenformer.github.io/
Apache License 2.0
384
stars
23
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
LR schedules
#11
alxndrTL
closed
1 week ago
1
Request for TPU Training Code Release
#10
Beomi
opened
1 week ago
0
Softmax
#9
kroggen
opened
2 weeks ago
2
Questions about model scaling
#8
Went-Liang
opened
2 weeks ago
2
Lacking instructions for inference
#7
kroggen
opened
2 weeks ago
8
Fails with last version of pytorch
#6
kroggen
opened
2 weeks ago
3
Minimal Implementation
#5
kroggen
closed
1 week ago
8
Use of llama2 or llama3 as baseline?
#4
pjj
closed
1 week ago
6
License for the codebase
#3
pjj
closed
1 week ago
2
chore: update mappings.py
#2
eltociear
opened
3 weeks ago
0
Maybe we can apply `ring attention` to scale up token former infinity?
#1
reyoung
closed
1 week ago
2