A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Hello,
When I configured --sequence-parallel and --tp-comm-overlap and started the training. It shows below information:
TypeError: UbufP2PCommOverlap(): incompatible function arguments. The following argument types are supported:
Hello, When I configured --sequence-parallel and --tp-comm-overlap and started the training. It shows below information: TypeError: UbufP2PCommOverlap(): incompatible function arguments. The following argument types are supported:
Invoked with: tensor([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ..., [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0]], device='cuda:3', dtype=torch.uint8), 3, 2, 16, 2, 0, 0, 3, 0, 0, tensor([]) How to fix it? Thanks.