aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
9.8k stars 6.67k forks source link

[Bug Report] RuntimeError: Dataset not found. You can use download=True to download it for pytorch minist horovod #4566

Open tianyizhouu opened 4 months ago

tianyizhouu commented 4 months ago

I got RuntimeError: Dataset not found. You can use download=True when running MNIST Training using PyTorch.

I am trying to change the Python version to 3.8 as a workaround but got new error:

ErrorMessage "RuntimeError: CUDA error: an illegal memory access was encountered
 CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
 For debugging consider passing CUDA_LAUNCH_BLOCKING=1."

Any hints appreciate!