carbonscott / exp-maxie

0 stars 0 forks source link

MFU calculation with `model.parameters()` might not be correct when using sharding. #1

Open carbonscott opened 4 weeks ago