Run a PL project with CLOUD-BASED CHECKPOINTS on AWS Sagemaker
# `default_root_dir` is the default path used for logs and checkpoints
trainer = Trainer(default_root_dir="s3://my_bucket/data/")
trainer.fit(model)
Error messages and logs
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/s3fs/core.py", line 113, in _error_wrapper
return await func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/aiobotocore/client.py", line 383, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the CreateMultipartUpload operation: Access Denied
Bug description
Access denied to save model checkpoint on AWS S3.
What version are you seeing the problem on?
v2.0
How to reproduce the bug
Error messages and logs
Environment
Current environment
* CUDA: - GPU: - NVIDIA GeForce RTX 3090 - available: True - version: 11.7 * Lightning: - lightning-utilities: 0.9.0 - pytorch-lightning: 2.0.9.post0 - pytorch-lightning-bolts: 0.3.2.post1 - torch: 2.0.1 - torchmetrics: 1.0.3 * Packages: - absl-py: 1.0.0 - aiobotocore: 2.5.4 - aiohttp: 3.8.1 - aioitertools: 0.11.0 - aiosignal: 1.2.0 - antlr4-python3-runtime: 4.9.3 - appdirs: 1.4.4 - async-timeout: 4.0.2 - attrs: 23.1.0 - boto3: 1.28.63 - botocore: 1.31.63 - cachetools: 5.0.0 - certifi: 2021.10.8 - charset-normalizer: 2.0.11 - click: 8.0.3 - cloudpickle: 2.2.1 - cmake: 3.27.6 - contextlib2: 21.6.0 - dill: 0.3.7 - docker-pycreds: 0.4.0 - filelock: 3.4.2 - frozenlist: 1.3.0 - fsspec: 2023.9.2 - future: 0.18.2 - gitdb: 4.0.10 - gitpython: 3.1.37 - google-auth: 2.6.0 - google-pasta: 0.2.0 - grpcio: 1.59.0 - huggingface-hub: 0.18.0 - hydra-core: 1.3.2 - idna: 3.3 - importlib-metadata: 4.10.1 - importlib-resources: 5.4.0 - jinja2: 3.1.2 - jmespath: 1.0.1 - joblib: 1.3.2 - jsonschema: 4.19.1 - jsonschema-specifications: 2023.7.1 - lightning-utilities: 0.9.0 - lit: 17.0.2 - markdown: 3.3.6 - markupsafe: 2.1.3 - mpmath: 1.3.0 - multidict: 6.0.2 - multiprocess: 0.70.15 - networkx: 3.1 - numpy: 1.24.4 - nvidia-cublas-cu11: 11.10.3.66 - nvidia-cuda-cupti-cu11: 11.7.101 - nvidia-cuda-nvrtc-cu11: 11.7.99 - nvidia-cuda-runtime-cu11: 11.7.99 - nvidia-cudnn-cu11: 8.5.0.96 - nvidia-cufft-cu11: 10.9.0.58 - nvidia-curand-cu11: 10.2.10.91 - nvidia-cusolver-cu11: 11.4.0.1 - nvidia-cusparse-cu11: 11.7.4.91 - nvidia-nccl-cu11: 2.14.3 - nvidia-nvtx-cu11: 11.7.91 - oauthlib: 3.2.0 - omegaconf: 2.3.0 - packaging: 21.3 - pandas: 2.0.3 - pathos: 0.3.1 - pathtools: 0.1.2 - pillow: 9.0.1 - pip: 20.0.2 - pkg-resources: 0.0.0 - pkgutil-resolve-name: 1.3.10 - platformdirs: 3.11.0 - pox: 0.3.3 - ppft: 1.7.6.7 - protobuf: 3.19.4 - psutil: 5.9.5 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pydeprecate: 0.3.0 - pyparsing: 3.0.7 - python-dateutil: 2.8.2 - pytorch-lightning: 2.0.9.post0 - pytorch-lightning-bolts: 0.3.2.post1 - pytz: 2023.3.post1 - pyyaml: 6.0.1 - referencing: 0.30.2 - regex: 2022.1.18 - requests: 2.31.0 - requests-oauthlib: 1.3.1 - rpds-py: 0.10.6 - rsa: 4.8 - s3fs: 2023.9.2 - s3transfer: 0.7.0 - sacremoses: 0.0.47 - safetensors: 0.4.0 - sagemaker: 2.192.0 - schema: 0.7.5 - scikit-learn: 1.3.1 - scipy: 1.10.1 - sentry-sdk: 1.32.0 - setproctitle: 1.3.3 - setuptools: 44.0.0 - six: 1.16.0 - smdebug-rulesconfig: 1.0.1 - smmap: 5.0.1 - sympy: 1.12 - tblib: 1.7.0 - tensorboard-data-server: 0.7.1 - tensorboard-plugin-wit: 1.8.1 - threadpoolctl: 3.2.0 - tokenizers: 0.13.3 - torch: 2.0.1 - torchmetrics: 1.0.3 - tqdm: 4.66.1 - transformers: 4.31.0 - triton: 2.0.0 - typing-extensions: 4.0.1 - tzdata: 2023.3 - urllib3: 1.26.17 - wandb: 0.15.12 - werkzeug: 2.0.3 - wheel: 0.34.2 - wrapt: 1.15.0 - yarl: 1.7.2 - zipp: 3.7.0 * System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.8.10 - release: 5.15.0-84-generic - version: #93~20.04.1-Ubuntu SMP Wed Sep 6 16:15:40 UTC 2023More info
No response