aws / deep-learning-containers

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet.
https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
Other
995 stars 455 forks source link

[bug] Outdated TransformerEngine #3996

Open dbpprt opened 3 months ago

dbpprt commented 3 months ago

Checklist

Concise Description: The included version of TransformerEngine (0.12.0) is not compatible with FlashAttention > 2.0.4 whilst recent transformer version require FlashAttention > 2.0.4

DLC image/dockerfile: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.3.0-gpu-py311-cu121-ubuntu20.04-sagemaker

Current behavior: Old version, doesn't support recent versions of FA

Expected behavior: It should be usable with recent versions of FA/transformers

Additional context:

sbhavani commented 2 weeks ago

we are also working on a pip wheel for TEv1.11 (ETA 10/15) that will remove the version requirement for flash-attn and make it an optional dependency. That might be a good time to update the DLC.