Open ChenYuHo opened 7 months ago
I believe the issue is this commit - https://github.com/NVIDIA/apex/commit/6ff45486f432f91eb86937a0def5eb5f2cf792ae Using an older version of apex (from before this commit) may help fix this problem - I am testing if this works.
Alternatively, latest NGC container is from before this commit, and does not seem to have this issue too.
Marking as stale. No activity in 60 days.
FusedLayerNormAffineFunction requires memory_efficient argument
https://github.com/NVIDIA/apex/blob/08f740290f999296d319ed2e3f21cd00b810918a/apex/normalization/fused_layer_norm.py#L34
but Megatron-LM's FusedLayerNormAffineFunction usage does not have it and causes errors.
https://github.com/NVIDIA/Megatron-LM/blob/443ce9f3f98fdc5a53c6b480c6e21b79944d198e/megatron/core/fusions/fused_layer_norm.py#L116
Should this be exposed to users?