AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet.
954
stars
445
forks
source link
[bug] smdistributed is not included in HuggingFace training image #3989
Open
dbpprt opened 3 weeks ago
Checklist
Concise Description: smdistributed is not available.
DLC image/dockerfile: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.1.0-transformers4.36.0-gpu-py310-cu121-ubuntu20.04
Current behavior:
Expected behavior:
Additional context: Installing it manually gives the following error:
from: https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.1.0/cu121/2024-02-04/smdistributed_dataparallel-2.1.0-cp310-cp310-linux_x86_64.whl