issues
search
carbonscott
/
exp-maxie
0
stars
0
forks
source link
MFU calculation with `model.parameters()` might not be correct when using sharding.
#1
Open
carbonscott
opened
4 weeks ago