NVIDIA / Megatron-LM

Ongoing research training transformer models at scale
https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start
Other
9.34k stars 2.11k forks source link

FusedLayerNormAffineFunction requires memory_efficient argument #593

Open ChenYuHo opened 7 months ago

ChenYuHo commented 7 months ago

FusedLayerNormAffineFunction requires memory_efficient argument

https://github.com/NVIDIA/apex/blob/08f740290f999296d319ed2e3f21cd00b810918a/apex/normalization/fused_layer_norm.py#L34

but Megatron-LM's FusedLayerNormAffineFunction usage does not have it and causes errors.

https://github.com/NVIDIA/Megatron-LM/blob/443ce9f3f98fdc5a53c6b480c6e21b79944d198e/megatron/core/fusions/fused_layer_norm.py#L116

Should this be exposed to users?

AnonNoNameAccount commented 7 months ago

I believe the issue is this commit - https://github.com/NVIDIA/apex/commit/6ff45486f432f91eb86937a0def5eb5f2cf792ae Using an older version of apex (from before this commit) may help fix this problem - I am testing if this works.

Alternatively, latest NGC container is from before this commit, and does not seem to have this issue too.

github-actions[bot] commented 5 months ago

Marking as stale. No activity in 60 days.