-
Hello,
I have been training models with mamba (v1) and I'm enjoying it. I would like to use MuTransfer for Mamba. Should I just scale the width params (matrices dim and conv dim) or are there oth…
-
I'm implementing muP for the OLMo model, and am facing an issue with the coordinate check.
![sp_trsfmr_adamw_coord](https://github.com/microsoft/mup/assets/6500683/6c268649-304c-4e78-b9fa-20692fdb…
-
Hello! First of all, thank you for doing such great work and making it so accessible. I'm looking at using `mup` for a project but I'm a bit confused about how to set the base shapes for the smaller m…
-
https://generallyintelligent.com/research/carbs/
NZ99 updated
10 months ago
-
https://arxiv.org/abs/2210.01765
Just the 3D input masking stuff
-
We should add support for mutransfer: https://github.com/microsoft/mup
Appears non-trivial, but not as difficult as MoE. We'd have to modify the model itself. https://github.com/microsoft/mup/blob/…
-
Ee want to implement [uTransfer](https://github.com/microsoft/mup) in our code-base to allow scaling our model's optimal parameters.
However, I see that they require using the optimizer mup.MuAdam in…
-
Hi!
I didn't fully understand how the transfer of parameters such as batch_size/seq_len/steps should work (Figure 17, 19 in the article). Also I didn't find any mention of this either in the article …
-
I can work on figuring out how to support the default compiler mode so that we can remove this line
_Originally posted by @hatemhelal in https://github.com/valence-discovery/goli/pull/123#discussio…