NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.85k stars 309 forks source link

AttnFuncWithCP with seq_len==1 breaks #1070

Closed MaciejBalaNV closed 1 month ago

MaciejBalaNV commented 2 months ago

TE Attention breaks here if sequence length is equal to 1, e.g. here. It was not the case until this commit. If seqlen==1 case is not supported, it would be nice to add a check and meaningful error message.

ptrendx commented 2 months ago

@xrennvidia and @cyanguwa Could you take a look?

xrennvidia commented 2 months ago

Hi @MaciejBalaNV

Thanks for reaching out. Could you please clarify what seqlen=1 means? Does it mean your total sequence length is 1? or sequence chunk length in each GPU is 1?

With CP, you need to guarantee your total sequence length is divisible by CP*2, so that each GPU will get at least 2 tokens.

xrennvidia commented 2 months ago

Hi @MaciejBalaNV

Is this what you are looking for?

MaciejBalaNV commented 1 month ago

Hey @xrennvidia , yes, this assert looks correct for the bug I've observed. Thanks, I believe we can close this issue.