aws-samples / amazon-sagemaker-local-mode

Amazon SageMaker Local Mode Examples
MIT No Attribution
242 stars 59 forks source link

PyTorch local mode job doesn't pick up GPU #13

Closed karthitect closed 2 years ago

karthitect commented 2 years ago

When I run the following PyTorch local mode sample on a SageMaker notebook instance (not Studio) - ml.p2.xlarge instance w/ 1 GPU - it does not pick up the GPU. The default PyTorch SageMaker SDK class seems to pull in a CPU image so I explicitly specified a GPU image (763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.10.0-gpu-py38-cu113-ubuntu20.04-sagemaker from here: https://github.com/aws/deep-learning-containers/blob/master/available_images.md) but it still seems to run on CPU.

If I run the included cifar10 training job directly on the notebook instance via Jupyter, then it does pick up the GPU.

Any suggestions would be appreciated. Thanks!

eitansela commented 2 years ago

You should use in the Estimator:

instance_type = "local_gpu"