Open bhauptvogel opened 10 months ago
Running nvidia-docker build -f docker/Dockerfile.model-training . -t oasst-model-training throws error:
nvidia-docker build -f docker/Dockerfile.model-training . -t oasst-model-training
> [5/6] RUN apt install -y python3.10-dev && git clone https://github.com/LAION-AI/Open-Assistant.git && source venv/bin/activate && pip install -e /workspace/Open-Assistant/oasst-data pip install -e /workspace/Open-Assistant/model: 0.396 0.396 WARNING: apt does not have a stable CLI interface. Use with caution in scripts. 0.396 0.418 Reading package lists... 0.882 Building dependency tree... 0.980 Reading state information... 1.060 The following additional packages will be installed: 1.060 libexpat1-dev libpython3.10-dev zlib1g-dev 1.080 The following NEW packages will be installed: 1.080 libexpat1-dev libpython3.10-dev python3.10-dev zlib1g-dev 1.162 0 upgraded, 4 newly installed, 0 to remove and 42 not upgraded. 1.162 Need to get 5582 kB of archives. 1.162 After this operation, 23.0 MB of additional disk space will be used. 1.162 Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libexpat1-dev amd64 2.4.7-1ubuntu0.2 [147 kB] 1.315 Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 zlib1g-dev amd64 1:1.2.11.dfsg-2ubuntu9.2 [164 kB] 1.349 Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libpython3.10-dev amd64 3.10.12-1~22.04.2 [4764 kB] 1.535 Get:4 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 python3.10-dev amd64 3.10.12-1~22.04.2 [507 kB] 1.625 debconf: delaying package configuration, since apt-utils is not installed 1.645 Fetched 5582 kB in 0s (12.0 MB/s) 1.661 Selecting previously unselected package libexpat1-dev:amd64. (Reading database ... 18476 files and directories currently installed.) 1.667 Preparing to unpack .../libexpat1-dev_2.4.7-1ubuntu0.2_amd64.deb ... 1.669 Unpacking libexpat1-dev:amd64 (2.4.7-1ubuntu0.2) ... 1.689 Selecting previously unselected package zlib1g-dev:amd64. 1.690 Preparing to unpack .../zlib1g-dev_1%3a1.2.11.dfsg-2ubuntu9.2_amd64.deb ... 1.692 Unpacking zlib1g-dev:amd64 (1:1.2.11.dfsg-2ubuntu9.2) ... 1.709 Selecting previously unselected package libpython3.10-dev:amd64. 1.710 Preparing to unpack .../libpython3.10-dev_3.10.12-1~22.04.2_amd64.deb ... 1.712 Unpacking libpython3.10-dev:amd64 (3.10.12-1~22.04.2) ... 1.785 Selecting previously unselected package python3.10-dev. 1.786 Preparing to unpack .../python3.10-dev_3.10.12-1~22.04.2_amd64.deb ... 1.791 Unpacking python3.10-dev (3.10.12-1~22.04.2) ... 1.843 Setting up libexpat1-dev:amd64 (2.4.7-1ubuntu0.2) ... 1.864 Setting up zlib1g-dev:amd64 (1:1.2.11.dfsg-2ubuntu9.2) ... 1.891 Setting up libpython3.10-dev:amd64 (3.10.12-1~22.04.2) ... 1.911 Setting up python3.10-dev (3.10.12-1~22.04.2) ... 1.935 Cloning into 'Open-Assistant'... 6.207 Obtaining file:///workspace/Open-Assistant/oasst-data 6.302 Installing build dependencies: started 7.872 Installing build dependencies: finished with status 'done' 7.875 Checking if build backend supports build_editable: started 7.921 Checking if build backend supports build_editable: finished with status 'done' 7.921 Getting requirements to build editable: started 7.966 Getting requirements to build editable: finished with status 'done' 7.967 Preparing editable metadata (pyproject.toml): started 8.013 Preparing editable metadata (pyproject.toml): finished with status 'done' 8.016 Obtaining file:///workspace/Open-Assistant/model 8.111 Installing build dependencies: started 10.49 Installing build dependencies: finished with status 'done' 10.49 Checking if build backend supports build_editable: started 10.57 Checking if build backend supports build_editable: finished with status 'done' 10.57 Getting requirements to build editable: started 10.80 Getting requirements to build editable: finished with status 'done' 10.89 Installing backend dependencies: started 12.02 Installing backend dependencies: finished with status 'done' 12.02 Preparing editable metadata (pyproject.toml): started 12.26 Preparing editable metadata (pyproject.toml): finished with status 'done' 12.26 Requirement already satisfied: pip in ./venv/lib/python3.10/site-packages (22.0.2) 12.51 Collecting install 12.66 Downloading install-1.3.5-py3-none-any.whl (3.2 kB) 12.85 Collecting datasets>=2.12.0 12.88 Downloading datasets-2.14.5-py3-none-any.whl (519 kB) 12.99 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 519.6/519.6 KB 5.0 MB/s eta 0:00:00 13.16 Collecting loguru==0.6.0 13.19 Downloading loguru-0.6.0-py3-none-any.whl (58 kB) 13.19 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 KB 7.9 MB/s eta 0:00:00 13.41 Collecting pydantic==1.10.7 13.43 Downloading pydantic-1.10.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB) 13.55 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 27.6 MB/s eta 0:00:00 13.57 Requirement already satisfied: typing-extensions>=4.2.0 in ./venv/lib/python3.10/site-packages (from pydantic==1.10.7->oasst_data==1.0.0) (4.7.1) 13.59 Collecting trlx@ git+https://github.com/CarperAI/trlx.git 13.59 Cloning https://github.com/CarperAI/trlx.git to /tmp/pip-install-oqoh1ao4/trlx_e3615061ee064eeab2b9a383222315e7 13.59 Running command git clone --filter=blob:none --quiet https://github.com/CarperAI/trlx.git /tmp/pip-install-oqoh1ao4/trlx_e3615061ee064eeab2b9a383222315e7 14.94 Resolved https://github.com/CarperAI/trlx.git to commit 3685a75a5c677ef8813871977c66940e598da3ef 15.03 Installing build dependencies: started 16.65 Installing build dependencies: finished with status 'done' 16.65 Getting requirements to build wheel: started 16.75 Getting requirements to build wheel: finished with status 'done' 16.85 Installing backend dependencies: started 18.00 Installing backend dependencies: finished with status 'done' 18.01 Preparing metadata (pyproject.toml): started 18.12 Preparing metadata (pyproject.toml): finished with status 'done' 18.26 Collecting deepspeed>=0.9.5 18.28 Downloading deepspeed-0.10.3.tar.gz (867 kB) 18.31 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 867.3/867.3 KB 27.7 MB/s eta 0:00:00 18.43 Preparing metadata (setup.py): started 19.46 Preparing metadata (setup.py): finished with status 'done' 19.56 Collecting sentencepiece>=0.1.97 19.59 Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) 19.63 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 29.1 MB/s eta 0:00:00 19.91 Collecting numpy>=1.22.4 19.95 Downloading numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB) 20.56 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 28.1 MB/s eta 0:00:00 20.65 Collecting accelerate>=0.21.0 20.67 Downloading accelerate-0.23.0-py3-none-any.whl (258 kB) 20.69 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 258.1/258.1 KB 20.0 MB/s eta 0:00:00 20.81 Collecting fastlangid>=1.0.11 20.84 Downloading fastlangid-1.0.11-py2.py3-none-any.whl (1.2 MB) 20.88 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 30.9 MB/s eta 0:00:00 20.97 Collecting torch>=2.0.0 21.00 Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB) 41.28 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 8.1 MB/s eta 0:00:00 42.58 Collecting flash-attn>=2.0.0 42.61 Downloading flash_attn-2.2.2.tar.gz (2.3 MB) 42.69 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 29.1 MB/s eta 0:00:00 43.04 Preparing metadata (setup.py): started 43.13 Preparing metadata (setup.py): finished with status 'error' 43.13 error: subprocess-exited-with-error 43.13 43.13 × python setup.py egg_info did not run successfully. 43.13 │ exit code: 1 43.13 ╰─> [6 lines of output] 43.13 Traceback (most recent call last): 43.13 File "<string>", line 2, in <module> 43.13 File "<pip-setuptools-caller>", line 34, in <module> 43.13 File "/tmp/pip-install-oqoh1ao4/flash-attn_e7d37a42812f46d1b089da8b5646022e/setup.py", line 8, in <module> 43.13 from packaging.version import parse, Version 43.13 ModuleNotFoundError: No module named 'packaging' 43.13 [end of output] 43.13 43.13 note: This error originates from a subprocess, and is likely not a problem with pip. 43.13 error: metadata-generation-failed 43.13 43.13 × Encountered error while generating package metadata. 43.13 ╰─> See above for output. 43.13 43.13 note: This is an issue with the package mentioned above, not pip. 43.13 hint: See above for details. ------ Dockerfile.model-training:17 -------------------- 16 | # clone OA repo & install dependencies 17 | >>> RUN apt install -y python3.10-dev && \ 18 | >>> git clone https://github.com/LAION-AI/Open-Assistant.git && \ 19 | >>> source venv/bin/activate && \ 20 | >>> pip install -e /workspace/Open-Assistant/oasst-data \ 21 | >>> pip install -e /workspace/Open-Assistant/model 22 | -------------------- ERROR: failed to solve: process "/bin/bash -c apt install -y python3.10-dev && git clone https://github.com/LAION-AI/Open-Assistant.git && source venv/bin/activate && pip install -e /workspace/Open-Assistant/oasst-data pip install -e /workspace/Open-Assistant/model" did not complete successfully: exit code: 1
Building container is possible when adding pip install packaging under line 14 in Dockerfile.model_training
pip install packaging
Running
nvidia-docker build -f docker/Dockerfile.model-training . -t oasst-model-training
throws error: