Open t-vi opened 1 day ago
This comes up in NeMo / NeVA:
https://github.com/NVIDIA/NeMo/blob/32503fd946cedc41152152837c01f95ae4bc6dc6/nemo/collections/nlp/modules/common/megatron/attention.py#L973-L973
cc @tfogal
Hint from the expert (thank you @tfogal): This can be avoided by using flash-attention.
This comes up in NeMo / NeVA:
https://github.com/NVIDIA/NeMo/blob/32503fd946cedc41152152837c01f95ae4bc6dc6/nemo/collections/nlp/modules/common/megatron/attention.py#L973-L973
cc @tfogal