NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.33k stars 1.39k forks source link

Make fused normalization functions backward-compatible #1760

Closed timmoon10 closed 9 months ago

timmoon10 commented 9 months ago

https://github.com/NVIDIA/apex/pull/1715 makes breaking API changes to some fused normalization functions, in particular adding memory_efficient as a positional argument. This PR makes memory_efficient a keyword argument to ensure backward compatibility.

This change is motivated by the fact that Megatron-LM uses the old API: https://github.com/NVIDIA/Megatron-LM/blob/2bc6cd307a11423928c675f741e79e03df23e721/megatron/core/fusions/fused_layer_norm.py#L147 This prevents NeMo from upgrading from the 23.09 to 23.11 PyTorch container. See https://github.com/NVIDIA/NeMo/pull/7909#issuecomment-1863754027.

Feedback would be appreciated. An alternative approach is to update Megatron-LM, but this seems simpler. Pinging @RuiWang1998.

RuiWang1998 commented 9 months ago

Hi @timmoon10 ,

Just thought people may not be using the Function directly and forgot about Megatron. I believe it might be best to submit another PR to Megatron-LM in tandem with this one, since I believe Megatron-Deepspeed already have this feature (https://github.com/microsoft/Megatron-DeepSpeed/pull/277) and it'd be great if Megatron has it as well.

timmoon10 commented 9 months ago

@RuiWang1998 That's nifty, it'll be convenient to just reuse that existing work.

These two approaches aren't mutually exclusive, so I don't think there is any harm to merging. This change won't break the newer code that uses memory_efficient.