Closed timmoon10 closed 10 months ago
Hi @timmoon10 ,
Just thought people may not be using the Function directly and forgot about Megatron. I believe it might be best to submit another PR to Megatron-LM in tandem with this one, since I believe Megatron-Deepspeed already have this feature (https://github.com/microsoft/Megatron-DeepSpeed/pull/277) and it'd be great if Megatron has it as well.
@RuiWang1998 That's nifty, it'll be convenient to just reuse that existing work.
These two approaches aren't mutually exclusive, so I don't think there is any harm to merging. This change won't break the newer code that uses memory_efficient
.
https://github.com/NVIDIA/apex/pull/1715 makes breaking API changes to some fused normalization functions, in particular adding
memory_efficient
as a positional argument. This PR makesmemory_efficient
a keyword argument to ensure backward compatibility.This change is motivated by the fact that Megatron-LM uses the old API: https://github.com/NVIDIA/Megatron-LM/blob/2bc6cd307a11423928c675f741e79e03df23e721/megatron/core/fusions/fused_layer_norm.py#L147 This prevents NeMo from upgrading from the 23.09 to 23.11 PyTorch container. See https://github.com/NVIDIA/NeMo/pull/7909#issuecomment-1863754027.
Feedback would be appreciated. An alternative approach is to update Megatron-LM, but this seems simpler. Pinging @RuiWang1998.