issues
search
bwconrad
/
soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
Apache License 2.0
42
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Training hyperparameters
#3
jacoblam3112
opened
1 month ago
0
Request to add registers and position embedding interpolation
#2
swarajnanda2021
opened
5 months ago
1
why using the same parameter matrix to normalize per column and normalize per row to generate D and C?
#1
wleilei
closed
10 months ago
1