databricks / megablocks

Apache License 2.0
1.11k stars 154 forks source link

fix the abnormal ‘CAPACITY_FACTOR’ value #79

Open jordgedu opened 6 months ago

jordgedu commented 6 months ago

When I tested it, I found that this abnormal value resulted in a huge amount of GPU memory

tgale96 commented 6 months ago

Ah yes, a while back we were specifying the capacity factor in terms of tokens rather than multiples of the expected number of tokens per expert. We must have missed updating this when we changed it :)

Would you mind updating the other moe scripts as well? Thanks!

tgale96 commented 6 months ago

Also, out of curiosity - why are you using MoE, as opposed to dMoE?