'TEDotProductAttention' object has no attribute 'tp_group_initialized'

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Apache License 2.0

1.6k stars 255 forks source link

'TEDotProductAttention' object has no attribute 'tp_group_initialized' #934

Open 1049451037 opened 2 weeks ago

1049451037 commented 2 weeks ago

Not working after updating to the main branch of TE in Megatron-LM.

timmoon10 commented 2 weeks ago

Can you provide more information or a minimal reproducer?

This error suggests that the tensor-parallel group has not been properly configured. If you are using one of Megatron-LM's TE wrappers, the TP group must either be initialized prior to creating the layer (with megatron.core.parallel_state.initialize_model_parallel) or registered after creating the layer (with TransformerEngineBaseModule.set_tensor_parallel_group, see this Megatron-LM comment).

1049451037 commented 2 weeks ago

Seems that TEDotProductAttention doesn't call the set_tensor_parallel_group function.