NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

TypeError: UbufP2PCommOverlap(): incompatible function arguments. #870

Closed holmes313 closed 1 month ago

holmes313 commented 1 month ago

Hello, When I configured --sequence-parallel and --tp-comm-overlap and started the training. It shows below information: TypeError: UbufP2PCommOverlap(): incompatible function arguments. The following argument types are supported:

  1. () -> None

Invoked with: tensor([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], device='cuda:3', dtype=torch.uint8), 3, 2, 16, 2, 0, 0, 3, 0, 0, tensor([]) How to fix it? Thanks.

Zhihao06 commented 1 month ago

I have met the same problem, could you tell me how to fix it, thanks