aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
9.79k stars 6.66k forks source link

Out of Memory when running the notebook according to instructions #4655

Open Leggerla opened 1 month ago

Leggerla commented 1 month ago

Link to the notebook Notebook

Describe the bug When running the !docker run -it --gpus all -v ${PWD}:/mount nvcr.io/nvidia/pytorch:22.10-py3 /bin/bash /mount/export.sh --verbose | tee conversion.txt cell the error Error Code 2: OutOfMemory (no further information) [05/08/2024-20:16:40] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. occurs.

To reproduce In AWS Sagemaker, create a Notebook instance of type ml.g4dn.xlarge and run the notebook

Logs

Unable to find image 'nvcr.io/nvidia/pytorch:22.10-py3' locally 22.10-py3: Pulling from nvidia/pytorch fb0b3276a519: Pulling fs layer 2416db5e3ba6: Pulling fs layer 2ba01ce48f03: Pulling fs layer 1953d8b854c3: Pulling fs layer 76cd223c882b: Pulling fs layer 45bae771bc00: Pulling fs layer 416ceba70e02: Pulling fs layer 9f29debe0d89: Pulling fs layer 94cb84c1285d: Pulling fs layer d8dcc244fe18: Pulling fs layer 33a5fab03e15: Pulling fs layer 02fe0924ac3c: Pulling fs layer 608c8a053303: Pulling fs layer 4f4fb700ef54: Pulling fs layer 079063e0d5ea: Pulling fs layer 3eb787a0a71b: Pulling fs layer 688b2c892903: Pulling fs layer b706a6c654f3: Pulling fs layer 6f1e1da7bd7b: Pulling fs layer dfb5d79a074f: Pulling fs layer 9cb4ae6c9b9e: Pulling fs layer 78180d3f3014: Pulling fs layer 7f0de540b633: Pulling fs layer f874466111b2: Pulling fs layer a113f7ab8786: Pulling fs layer ef89dda3be8a: Pulling fs layer 0a7a609209c7: Pulling fs layer d6c9f654232d: Pulling fs layer 0552db6fc3c7: Pulling fs layer 6faa6f074f1c: Pulling fs layer 76290bc2ba87: Pulling fs layer 4d6f0741709b: Pulling fs layer d0f39b540bbd: Pulling fs layer cb0ff236cd2f: Pulling fs layer fe469f00cd2e: Pulling fs layer 8c9265b5196f: Pulling fs layer a1ce2bb994e5: Pulling fs layer 745802940cf0: Pulling fs layer fad8d441dc48: Pulling fs layer 62e169860c65: Pulling fs layer 56fc6ee11a76: Pulling fs layer 8079d7cc9429: Pulling fs layer c2959f3c1f79: Pulling fs layer b2f869cedbea: Pulling fs layer 342f9ecd7d0b: Pulling fs layer 7ed5a40d20ce: Pulling fs layer b75f99413198: Pulling fs layer 9b13389ffa92: Pulling fs layer f874466111b2: Waiting a113f7ab8786: Waiting a1ce2bb994e5: Waiting ef89dda3be8a: Waiting 745802940cf0: Waiting 0a7a609209c7: Waiting fad8d441dc48: Waiting 62e169860c65: Waiting 1953d8b854c3: Waiting 56fc6ee11a76: Waiting 76cd223c882b: Waiting 8079d7cc9429: Waiting 45bae771bc00: Waiting c2959f3c1f79: Waiting 416ceba70e02: Waiting b2f869cedbea: Waiting 342f9ecd7d0b: Waiting 9f29debe0d89: Waiting 7ed5a40d20ce: Waiting 94cb84c1285d: Waiting d8dcc244fe18: Waiting b75f99413198: Waiting 9b13389ffa92: Waiting 33a5fab03e15: Waiting 6f1e1da7bd7b: Waiting 608c8a053303: Waiting dfb5d79a074f: Waiting 4f4fb700ef54: Waiting 9cb4ae6c9b9e: Waiting 079063e0d5ea: Waiting 78180d3f3014: Waiting 3eb787a0a71b: Waiting 7f0de540b633: Waiting 688b2c892903: Waiting b706a6c654f3: Waiting d6c9f654232d: Waiting 76290bc2ba87: Waiting 0552db6fc3c7: Waiting 4d6f0741709b: Waiting 6faa6f074f1c: Waiting fe469f00cd2e: Waiting d0f39b540bbd: Waiting cb0ff236cd2f: Waiting 8c9265b5196f: Waiting fb0b3276a519: Verifying Checksum fb0b3276a519: Download complete 1953d8b854c3: Verifying Checksum 1953d8b854c3: Download complete fb0b3276a519: Pull complete 2416db5e3ba6: Verifying Checksum 2416db5e3ba6: Download complete 2ba01ce48f03: Verifying Checksum 2ba01ce48f03: Download complete 45bae771bc00: Verifying Checksum 45bae771bc00: Download complete 416ceba70e02: Verifying Checksum 416ceba70e02: Download complete 9f29debe0d89: Verifying Checksum 9f29debe0d89: Download complete 94cb84c1285d: Verifying Checksum 94cb84c1285d: Download complete d8dcc244fe18: Verifying Checksum d8dcc244fe18: Download complete 02fe0924ac3c: Verifying Checksum 02fe0924ac3c: Download complete 2416db5e3ba6: Pull complete 33a5fab03e15: Verifying Checksum 33a5fab03e15: Download complete 4f4fb700ef54: Verifying Checksum 4f4fb700ef54: Download complete 079063e0d5ea: Verifying Checksum 079063e0d5ea: Download complete 2ba01ce48f03: Pull complete 1953d8b854c3: Pull complete 608c8a053303: Verifying Checksum 608c8a053303: Download complete 688b2c892903: Download complete b706a6c654f3: Verifying Checksum b706a6c654f3: Download complete 6f1e1da7bd7b: Verifying Checksum 6f1e1da7bd7b: Download complete 3eb787a0a71b: Verifying Checksum 3eb787a0a71b: Download complete 9cb4ae6c9b9e: Verifying Checksum 9cb4ae6c9b9e: Download complete dfb5d79a074f: Verifying Checksum dfb5d79a074f: Download complete 78180d3f3014: Verifying Checksum 78180d3f3014: Download complete f874466111b2: Download complete 7f0de540b633: Verifying Checksum 7f0de540b633: Download complete ef89dda3be8a: Verifying Checksum ef89dda3be8a: Download complete a113f7ab8786: Verifying Checksum a113f7ab8786: Download complete d6c9f654232d: Verifying Checksum d6c9f654232d: Download complete 0552db6fc3c7: Download complete 0a7a609209c7: Verifying Checksum 0a7a609209c7: Download complete 76290bc2ba87: Verifying Checksum 76290bc2ba87: Download complete 4d6f0741709b: Download complete d0f39b540bbd: Verifying Checksum d0f39b540bbd: Download complete cb0ff236cd2f: Verifying Checksum fe469f00cd2e: Verifying Checksum fe469f00cd2e: Download complete 6faa6f074f1c: Verifying Checksum 6faa6f074f1c: Download complete a1ce2bb994e5: Verifying Checksum a1ce2bb994e5: Download complete 745802940cf0: Download complete fad8d441dc48: Verifying Checksum fad8d441dc48: Download complete 62e169860c65: Verifying Checksum 62e169860c65: Download complete 56fc6ee11a76: Verifying Checksum 56fc6ee11a76: Download complete 8079d7cc9429: Verifying Checksum 8079d7cc9429: Download complete c2959f3c1f79: Download complete b2f869cedbea: Verifying Checksum b2f869cedbea: Download complete 342f9ecd7d0b: Download complete 7ed5a40d20ce: Verifying Checksum 7ed5a40d20ce: Download complete b75f99413198: Verifying Checksum b75f99413198: Download complete 9b13389ffa92: Verifying Checksum 9b13389ffa92: Download complete 76cd223c882b: Verifying Checksum 76cd223c882b: Download complete 8c9265b5196f: Verifying Checksum 8c9265b5196f: Download complete 76cd223c882b: Pull complete 45bae771bc00: Pull complete 416ceba70e02: Pull complete 9f29debe0d89: Pull complete 94cb84c1285d: Pull complete d8dcc244fe18: Pull complete 33a5fab03e15: Pull complete 02fe0924ac3c: Pull complete 608c8a053303: Pull complete 4f4fb700ef54: Pull complete 079063e0d5ea: Pull complete 3eb787a0a71b: Pull complete 688b2c892903: Pull complete b706a6c654f3: Pull complete 6f1e1da7bd7b: Pull complete dfb5d79a074f: Pull complete 9cb4ae6c9b9e: Pull complete 78180d3f3014: Pull complete 7f0de540b633: Pull complete f874466111b2: Pull complete a113f7ab8786: Pull complete ef89dda3be8a: Pull complete 0a7a609209c7: Pull complete d6c9f654232d: Pull complete 0552db6fc3c7: Pull complete 6faa6f074f1c: Pull complete 76290bc2ba87: Pull complete 4d6f0741709b: Pull complete d0f39b540bbd: Pull complete cb0ff236cd2f: Pull complete fe469f00cd2e: Pull complete 8c9265b5196f: Pull complete a1ce2bb994e5: Pull complete 745802940cf0: Pull complete fad8d441dc48: Pull complete 62e169860c65: Pull complete 56fc6ee11a76: Pull complete 8079d7cc9429: Pull complete c2959f3c1f79: Pull complete b2f869cedbea: Pull complete 342f9ecd7d0b: Pull complete 7ed5a40d20ce: Pull complete b75f99413198: Pull complete 9b13389ffa92: Pull complete Digest: sha256:7ad18fc3d2b9cdc35f9e5f0043987e8391fcf592c88177fdd9daa31b3b886be9 Status: Downloaded newer image for nvcr.io/nvidia/pytorch:22.10-py3

============= == PyTorch ==

NVIDIA Release 22.10 (build 46164382) PyTorch Version 1.13.0a0+d0d6b1f

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2022 Facebook Inc. Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) Copyright (c) 2011-2013 NYU (Clement Farabet) Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) Copyright (c) 2006 Idiap Research Institute (Samy Bengio) Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) Copyright (c) 2015 Google Inc. Copyright (c) 2015 Yangqing Jia Copyright (c) 2013-2016 The Caffe contributors All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for PyTorch. NVIDIA recommends the use of the following flags: docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Collecting transformers Downloading transformers-4.40.2-py3-none-any.whl (9.0 MB) |████████████████████████████████| 9.0 MB 25.9 MB/s eta 0:00:01 Collecting ftfy Downloading ftfy-6.2.0-py3-none-any.whl (54 kB) |████████████████████████████████| 54 kB 62.4 MB/s eta 0:00:01 Requirement already satisfied: scipy in /opt/conda/lib/python3.8/site-packages (1.6.3) Collecting tokenizers<0.20,>=0.19 Downloading tokenizers-0.19.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB) |████████████████████████████████| 3.6 MB 74.7 MB/s eta 0:00:01 Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers) (2.28.1) Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from transformers) (6.0) Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.8/site-packages (from transformers) (1.22.2) Collecting huggingface-hub<1.0,>=0.19.3 Downloading huggingface_hub-0.23.0-py3-none-any.whl (401 kB) |████████████████████████████████| 401 kB 84.4 MB/s eta 0:00:01 Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers) (4.64.1) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers) (2022.9.13) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from transformers) (21.3) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers) (3.8.0) Collecting safetensors>=0.4.1 Downloading safetensors-0.4.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) |████████████████████████████████| 1.2 MB 70.5 MB/s eta 0:00:01 Collecting wcwidth<0.3.0,>=0.2.12 Downloading wcwidth-0.2.13-py2.py3-none-any.whl (34 kB) Collecting fsspec>=2023.5.0 Downloading fsspec-2024.3.1-py3-none-any.whl (171 kB) |████████████████████████████████| 171 kB 82.4 MB/s eta 0:00:01 Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers) (4.4.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.0->transformers) (3.0.9) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (2022.9.24) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (3.3) Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (2.1.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers) (1.26.11) Installing collected packages: fsspec, huggingface-hub, wcwidth, tokenizers, safetensors, transformers, ftfy Attempting uninstall: fsspec Found existing installation: fsspec 2022.8.2 Uninstalling fsspec-2022.8.2: Successfully uninstalled fsspec-2022.8.2 Attempting uninstall: wcwidth Found existing installation: wcwidth 0.2.5 Uninstalling wcwidth-0.2.5: Successfully uninstalled wcwidth-0.2.5 Successfully installed fsspec-2024.3.1 ftfy-6.2.0 huggingface-hub-0.23.0 safetensors-0.4.3 tokenizers-0.19.1 transformers-4.40.2 wcwidth-0.2.13 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Requirement already satisfied: transformers[onnxruntime] in /opt/conda/lib/python3.8/site-packages (4.40.2) Requirement already satisfied: tokenizers<0.20,>=0.19 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (0.19.1) Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (0.23.0) Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (1.22.2) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (2.28.1) Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (4.64.1) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (21.3) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (2022.9.13) Requirement already satisfied: safetensors>=0.4.1 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (0.4.3) Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (6.0) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from transformers[onnxruntime]) (3.8.0) Collecting onnxruntime-tools>=1.4.2 Downloading onnxruntime_tools-1.7.0-py3-none-any.whl (212 kB) |████████████████████████████████| 212 kB 27.1 MB/s eta 0:00:01 Collecting onnxruntime>=1.4.0 Downloading onnxruntime-1.17.3-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (6.8 MB) |████████████████████████████████| 6.8 MB 51.6 MB/s eta 0:00:01 Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers[onnxruntime]) (2024.3.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.19.3->transformers[onnxruntime]) (4.4.0) Collecting sympy Downloading sympy-1.12-py3-none-any.whl (5.7 MB) |████████████████████████████████| 5.7 MB 77.3 MB/s eta 0:00:01 Collecting flatbuffers Downloading flatbuffers-24.3.25-py2.py3-none-any.whl (26 kB) Collecting coloredlogs Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB) |████████████████████████████████| 46 kB 60.7 MB/s eta 0:00:01 Requirement already satisfied: protobuf in /opt/conda/lib/python3.8/site-packages (from onnxruntime>=1.4.0->transformers[onnxruntime]) (3.20.3) Requirement already satisfied: psutil in /opt/conda/lib/python3.8/site-packages (from onnxruntime-tools>=1.4.2->transformers[onnxruntime]) (5.9.2) Collecting py3nvml Downloading py3nvml-0.2.7-py3-none-any.whl (55 kB) |████████████████████████████████| 55 kB 63.4 MB/s eta 0:00:01 Requirement already satisfied: onnx in /opt/conda/lib/python3.8/site-packages (from onnxruntime-tools>=1.4.2->transformers[onnxruntime]) (1.12.0) Collecting py-cpuinfo Downloading py_cpuinfo-9.0.0-py3-none-any.whl (22 kB) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.0->transformers[onnxruntime]) (3.0.9) Collecting humanfriendly>=9.1 Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB) |████████████████████████████████| 86 kB 74.2 MB/s eta 0:00:01 Collecting protobuf Downloading protobuf-3.20.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB) |████████████████████████████████| 1.0 MB 83.3 MB/s eta 0:00:01 Collecting xmltodict Downloading xmltodict-0.13.0-py2.py3-none-any.whl (10.0 kB) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (2022.9.24) Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (2.1.0) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->transformers[onnxruntime]) (1.26.11) Collecting mpmath>=0.19 Downloading mpmath-1.3.0-py3-none-any.whl (536 kB) |████████████████████████████████| 536 kB 82.4 MB/s eta 0:00:01 Installing collected packages: xmltodict, protobuf, mpmath, humanfriendly, sympy, py3nvml, py-cpuinfo, flatbuffers, coloredlogs, onnxruntime-tools, onnxruntime Attempting uninstall: protobuf Found existing installation: protobuf 3.20.3 Uninstalling protobuf-3.20.3: Successfully uninstalled protobuf-3.20.3 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorboard 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.1 which is incompatible. Successfully installed coloredlogs-15.0.1 flatbuffers-24.3.25 humanfriendly-10.0 mpmath-1.3.0 onnxruntime-1.17.3 onnxruntime-tools-1.7.0 protobuf-3.20.1 py-cpuinfo-9.0.0 py3nvml-0.2.7 sympy-1.12 xmltodict-0.13.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Collecting diffusers Downloading diffusers-0.27.2-py3-none-any.whl (2.0 MB) |████████████████████████████████| 2.0 MB 19.3 MB/s eta 0:00:01 Requirement already satisfied: safetensors>=0.3.1 in /opt/conda/lib/python3.8/site-packages (from diffusers) (0.4.3) Requirement already satisfied: huggingface-hub>=0.20.2 in /opt/conda/lib/python3.8/site-packages (from diffusers) (0.23.0) Requirement already satisfied: filelock in /opt/conda/lib/python3.8/site-packages (from diffusers) (3.8.0) Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.8/site-packages (from diffusers) (5.0.0) Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.8/site-packages (from diffusers) (2022.9.13) Requirement already satisfied: Pillow in /opt/conda/lib/python3.8/site-packages (from diffusers) (9.0.1) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from diffusers) (2.28.1) Requirement already satisfied: numpy in /opt/conda/lib/python3.8/site-packages (from diffusers) (1.22.2) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (4.4.0) Requirement already satisfied: tqdm>=4.42.1 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (4.64.1) Requirement already satisfied: packaging>=20.9 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (21.3) Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (2024.3.1) Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.8/site-packages (from huggingface-hub>=0.20.2->diffusers) (6.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging>=20.9->huggingface-hub>=0.20.2->diffusers) (3.0.9) Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.8/site-packages (from importlib-metadata->diffusers) (3.9.0) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (3.3) Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (2.1.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (1.26.11) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests->diffusers) (2022.9.24) Installing collected packages: diffusers Successfully installed diffusers-0.27.2 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache(). 0it [00:00, ?it/s] Cannot initialize model with low cpu memory usage because accelerate was not found in the environment. Defaulting to low_cpu_mem_usage=False. It is strongly recommended to install accelerate for faster and less memory-intense model loading. You can do so with:

pip install accelerate

. /opt/conda/lib/python3.8/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( vae/config.json: 100% 551/551 [00:00<00:00, 84.5kB/s] diffusion_pytorch_model.safetensors: 100% 335M/335M [00:00<00:00, 387MB/s] tokenizer_config.json: 100% 905/905 [00:00<00:00, 139kB/s] vocab.json: 100% 961k/961k [00:00<00:00, 40.9MB/s] merges.txt: 100% 525k/525k [00:00<00:00, 48.5MB/s] special_tokens_map.json: 100% 389/389 [00:00<00:00, 239kB/s] tokenizer.json: 100% 2.22M/2.22M [00:00<00:00, 27.9MB/s] config.json: 100% 4.52k/4.52k [00:00<00:00, 904kB/s] model.safetensors: 100% 1.71G/1.71G [00:03<00:00, 443MB/s] /opt/conda/lib/python3.8/site-packages/diffusers/models/upsampling.py:149: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /opt/conda/lib/python3.8/site-packages/diffusers/models/upsampling.py:165: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: /opt/conda/lib/python3.8/site-packages/diffusers/models/autoencoders/autoencoder_kl.py:306: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if not return_dict: /opt/conda/lib/python3.8/site-packages/torch/onnx/_patch_torch.py:69: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1880.) torch._C._jit_pass_onnx_node_shape_type_inference( /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:649: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1880.) _C._jit_pass_onnx_graph_shape_type_inference( /opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py:1125: UserWarning: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/jit/passes/onnx/shape_type_inference.cpp:1880.) _C._jit_pass_onnx_graph_shape_type_inference( Here is the shape of the input ----------------------------------------------------- torch.Size([1, 77]) /opt/conda/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:86: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /opt/conda/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py:162: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:279: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz self.num_heads, tgt_len, src_len): /opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:287: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): /opt/conda/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py:319: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz self.num_heads, tgt_len, self.head_dim): /opt/conda/lib/python3.8/site-packages/torch/onnx/symbolic_opset9.py:4595: UserWarning: Exporting aten::index operator of advanced indexing in opset 14 is achieved by combination of multiple ONNX operators, including Reshape, Transpose, Concat, and Gather. If indices include negative values, the exported graph will produce incorrect results. warnings.warn( &&&& RUNNING TensorRT.trtexec [TensorRT v8500] # trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16 [05/08/2024-19:50:50] [I] === Model Options === [05/08/2024-19:50:50] [I] Format: ONNX [05/08/2024-19:50:50] [I] Model: vae.onnx [05/08/2024-19:50:50] [I] Output: [05/08/2024-19:50:50] [I] === Build Options === [05/08/2024-19:50:50] [I] Max batch: explicit batch [05/08/2024-19:50:50] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default [05/08/2024-19:50:50] [I] minTiming: 1 [05/08/2024-19:50:50] [I] avgTiming: 8 [05/08/2024-19:50:50] [I] Precision: FP32+FP16 [05/08/2024-19:50:50] [I] LayerPrecisions: [05/08/2024-19:50:50] [I] Calibration: [05/08/2024-19:50:50] [I] Refit: Disabled [05/08/2024-19:50:50] [I] Sparsity: Disabled [05/08/2024-19:50:50] [I] Safe mode: Disabled [05/08/2024-19:50:50] [I] DirectIO mode: Disabled [05/08/2024-19:50:50] [I] Restricted mode: Disabled [05/08/2024-19:50:50] [I] Build only: Disabled [05/08/2024-19:50:50] [I] Save engine: vae.plan [05/08/2024-19:50:50] [I] Load engine: [05/08/2024-19:50:50] [I] Profiling verbosity: 0 [05/08/2024-19:50:50] [I] Tactic sources: Using default tactic sources [05/08/2024-19:50:50] [I] timingCacheMode: local [05/08/2024-19:50:50] [I] timingCacheFile: [05/08/2024-19:50:50] [I] Heuristic: Disabled [05/08/2024-19:50:50] [I] Preview Features: Use default preview flags. [05/08/2024-19:50:50] [I] Input(s)s format: fp32:CHW [05/08/2024-19:50:50] [I] Output(s)s format: fp32:CHW [05/08/2024-19:50:50] [I] Input build shape: latent_sample=1x4x64x64+4x4x64x64+8x4x64x64 [05/08/2024-19:50:50] [I] Input calibration shapes: model [05/08/2024-19:50:50] [I] === System Options === [05/08/2024-19:50:50] [I] Device: 0 [05/08/2024-19:50:50] [I] DLACore: [05/08/2024-19:50:50] [I] Plugins: [05/08/2024-19:50:50] [I] === Inference Options === [05/08/2024-19:50:50] [I] Batch: Explicit [05/08/2024-19:50:50] [I] Input inference shape: latent_sample=4x4x64x64 [05/08/2024-19:50:50] [I] Iterations: 10 [05/08/2024-19:50:50] [I] Duration: 3s (+ 200ms warm up) [05/08/2024-19:50:50] [I] Sleep time: 0ms [05/08/2024-19:50:50] [I] Idle time: 0ms [05/08/2024-19:50:50] [I] Streams: 1 [05/08/2024-19:50:50] [I] ExposeDMA: Disabled [05/08/2024-19:50:50] [I] Data transfers: Enabled [05/08/2024-19:50:50] [I] Spin-wait: Disabled [05/08/2024-19:50:50] [I] Multithreading: Disabled [05/08/2024-19:50:50] [I] CUDA Graph: Disabled [05/08/2024-19:50:50] [I] Separate profiling: Disabled [05/08/2024-19:50:50] [I] Time Deserialize: Disabled [05/08/2024-19:50:50] [I] Time Refit: Disabled [05/08/2024-19:50:50] [I] NVTX verbosity: 0 [05/08/2024-19:50:50] [I] Persistent Cache Ratio: 0 [05/08/2024-19:50:50] [I] Inputs: [05/08/2024-19:50:50] [I] === Reporting Options === [05/08/2024-19:50:50] [I] Verbose: Disabled [05/08/2024-19:50:50] [I] Averages: 10 inferences [05/08/2024-19:50:50] [I] Percentiles: 90,95,99 [05/08/2024-19:50:50] [I] Dump refittable layers:Disabled [05/08/2024-19:50:50] [I] Dump output: Disabled [05/08/2024-19:50:50] [I] Profile: Disabled [05/08/2024-19:50:50] [I] Export timing to JSON file: [05/08/2024-19:50:50] [I] Export output to JSON file: [05/08/2024-19:50:50] [I] Export profile to JSON file: [05/08/2024-19:50:50] [I] [05/08/2024-19:50:50] [I] === Device Information === [05/08/2024-19:50:50] [I] Selected Device: Tesla T4 [05/08/2024-19:50:50] [I] Compute Capability: 7.5 [05/08/2024-19:50:50] [I] SMs: 40 [05/08/2024-19:50:50] [I] Compute Clock Rate: 1.59 GHz [05/08/2024-19:50:50] [I] Device Global Memory: 15102 MiB [05/08/2024-19:50:50] [I] Shared Memory per SM: 64 KiB [05/08/2024-19:50:50] [I] Memory Bus Width: 256 bits (ECC enabled) [05/08/2024-19:50:50] [I] Memory Clock Rate: 5.001 GHz [05/08/2024-19:50:50] [I] [05/08/2024-19:50:50] [I] TensorRT version: 8.5.0 [05/08/2024-19:50:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 26, GPU 103 (MiB) [05/08/2024-19:50:55] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +340, GPU +74, now: CPU 418, GPU 177 (MiB) [05/08/2024-19:50:55] [I] Start parsing network model [05/08/2024-19:50:55] [I] [TRT] ---------------------------------------------------------------- [05/08/2024-19:50:55] [I] [TRT] Input filename: vae.onnx [05/08/2024-19:50:55] [I] [TRT] ONNX IR version: 0.0.7 [05/08/2024-19:50:55] [I] [TRT] Opset version: 14 [05/08/2024-19:50:55] [I] [TRT] Producer name: pytorch [05/08/2024-19:50:55] [I] [TRT] Producer version: 1.13.0 [05/08/2024-19:50:55] [I] [TRT] Domain:
[05/08/2024-19:50:55] [I] [TRT] Model version: 0 [05/08/2024-19:50:55] [I] [TRT] Doc string:
[05/08/2024-19:50:55] [I] [TRT] ---------------------------------------------------------------- [05/08/2024-19:50:55] [W] [TRT] parsers/onnx/onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [05/08/2024-19:50:56] [I] Finish parsing network model [05/08/2024-19:50:57] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 639, GPU 827 (MiB) [05/08/2024-19:50:57] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 639, GPU 837 (MiB) [05/08/2024-19:50:57] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [05/08/2024-20:16:40] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:16:40] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:16:40] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [05/08/2024-20:16:40] [W] [TRT] Skipping tactic 2 due to insufficient memory on requested size of 34359738368 detected for tactic 0x0000000000000002. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [05/08/2024-20:16:51] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:16:51] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:16:51] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [05/08/2024-20:16:51] [W] [TRT] Skipping tactic 7 due to insufficient memory on requested size of 34359738368 detected for tactic 0x000000000000003a. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [05/08/2024-20:21:17] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:21:17] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:21:17] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [05/08/2024-20:21:17] [W] [TRT] Skipping tactic 2 due to insufficient memory on requested size of 34359738368 detected for tactic 0x0000000000000002. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [05/08/2024-20:21:22] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:21:22] [E] Error[2]: [virtualMemoryBuffer.cpp::resizePhysical::145] Error Code 2: OutOfMemory (no further information) [05/08/2024-20:21:22] [W] [TRT] Requested amount of GPU memory (34359738368 bytes) could not be allocated. There may not be enough free memory for allocation to succeed. [05/08/2024-20:21:22] [W] [TRT] Skipping tactic 7 due to insufficient memory on requested size of 34359738368 detected for tactic 0x000000000000003a. Try decreasing the workspace size with IBuilderConfig::setMemoryPoolLimit(). [05/08/2024-20:26:27] [I] [TRT] Detected 1 inputs and 1 output network tensors. [05/08/2024-20:26:27] [I] [TRT] Total Host Persistent Memory: 169088 [05/08/2024-20:26:27] [I] [TRT] Total Device Persistent Memory: 16685568 [05/08/2024-20:26:27] [I] [TRT] Total Scratch Memory: 33554432 [05/08/2024-20:26:27] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 97 MiB, GPU 12290 MiB [05/08/2024-20:26:27] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 105.026ms to assign 7 blocks to 311 nodes requiring 3556769796 bytes. [05/08/2024-20:26:27] [I] [TRT] Total Activation Memory: 3556769796 [05/08/2024-20:26:27] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 1210, GPU 1355 (MiB) [05/08/2024-20:26:27] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 1210, GPU 1363 (MiB) [05/08/2024-20:26:27] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy. [05/08/2024-20:26:27] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. [05/08/2024-20:26:27] [W] [TRT] Check verbose logs for the list of affected weights. [05/08/2024-20:26:27] [W] [TRT] - 53 weights are affected by this issue: Detected subnormal FP16 values. [05/08/2024-20:26:27] [W] [TRT] - 27 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. [05/08/2024-20:26:27] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +96, GPU +112, now: CPU 96, GPU 112 (MiB) [05/08/2024-20:26:28] [I] Engine built in 2137.59 sec. [05/08/2024-20:26:28] [I] [TRT] Loaded engine size: 97 MiB [05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 664, GPU 779 (MiB) [05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 664, GPU 787 (MiB) [05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +112, now: CPU 0, GPU 112 (MiB) [05/08/2024-20:26:28] [I] Engine deserialized in 0.0565643 sec. [05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 664, GPU 797 (MiB) [05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 664, GPU 805 (MiB) [05/08/2024-20:26:28] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +3408, now: CPU 0, GPU 3520 (MiB) [05/08/2024-20:26:28] [I] Setting persistentCacheLimit to 0 bytes. [05/08/2024-20:26:28] [I] Using random values for input latent_sample [05/08/2024-20:26:28] [I] Created input binding for latent_sample with dimensions 4x4x64x64 [05/08/2024-20:26:28] [I] Using random values for output sample [05/08/2024-20:26:28] [I] Created output binding for sample with dimensions 4x3x512x512 [05/08/2024-20:26:28] [I] Starting inference [05/08/2024-20:26:34] [I] Warmup completed 1 queries over 200 ms [05/08/2024-20:26:34] [I] Timing trace has 10 queries over 6.31709 s [05/08/2024-20:26:34] [I] [05/08/2024-20:26:34] [I] === Trace details === [05/08/2024-20:26:34] [I] Trace averages of 10 runs: [05/08/2024-20:26:34] [I] Average on 10 runs - GPU latency: 576.906 ms - Host latency: 579.155 ms (enqueue 2.51767 ms) [05/08/2024-20:26:34] [I] [05/08/2024-20:26:34] [I] === Performance summary === [05/08/2024-20:26:34] [I] Throughput: 1.58301 qps [05/08/2024-20:26:34] [I] Latency: min = 563.076 ms, max = 591.083 ms, mean = 579.155 ms, median = 580.306 ms, percentile(90%) = 590.798 ms, percentile(95%) = 591.083 ms, percentile(99%) = 591.083 ms [05/08/2024-20:26:34] [I] Enqueue Time: min = 2.23538 ms, max = 2.66504 ms, mean = 2.51767 ms, median = 2.52121 ms, percentile(90%) = 2.66113 ms, percentile(95%) = 2.66504 ms, percentile(99%) = 2.66504 ms [05/08/2024-20:26:34] [I] H2D Latency: min = 0.0797439 ms, max = 0.0925293 ms, mean = 0.0872102 ms, median = 0.0871582 ms, percentile(90%) = 0.0898438 ms, percentile(95%) = 0.0925293 ms, percentile(99%) = 0.0925293 ms [05/08/2024-20:26:34] [I] GPU Compute Time: min = 560.796 ms, max = 588.808 ms, mean = 576.906 ms, median = 578.167 ms, percentile(90%) = 588.523 ms, percentile(95%) = 588.808 ms, percentile(99%) = 588.808 ms [05/08/2024-20:26:34] [I] D2H Latency: min = 1.90869 ms, max = 2.19568 ms, mean = 2.16147 ms, median = 2.1897 ms, percentile(90%) = 2.19324 ms, percentile(95%) = 2.19568 ms, percentile(99%) = 2.19568 ms [05/08/2024-20:26:34] [I] Total Host Walltime: 6.31709 s [05/08/2024-20:26:34] [I] Total GPU Compute Time: 5.76906 s [05/08/2024-20:26:34] [W] * GPU compute time is unstable, with coefficient of variance = 1.55722%. [05/08/2024-20:26:34] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability. [05/08/2024-20:26:34] [I] Explanations of the performance metrics are printed in the verbose logs. [05/08/2024-20:26:34] [I] &&&& PASSED TensorRT.trtexec [TensorRT v8500] # trtexec --onnx=vae.onnx --saveEngine=vae.plan --minShapes=latent_sample:1x4x64x64 --optShapes=latent_sample:4x4x64x64 --maxShapes=latent_sample:8x4x64x64 --fp16