Closed jasl closed 1 year ago
Hi @jasl , can you try switching to the dev
branch of jetson-containers?
Commit https://github.com/dusty-nv/jetson-containers/commit/712252b39835573a8a18bcafecc9d9bb64605c11 is staged there, which patches this issue introduced by recent transformers update.
# "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 118
# AttributeError: module 'torch.distributed' has no attribute 'is_initialized'
RUN PYTHON_ROOT=`pip3 show transformers | grep Location: | cut -d' ' -f2` && \
sed -i 's|torch.distributed.is_initialized|torch.distributed.is_available|g' -i ${PYTHON_ROOT}/transformers/modeling_utils.py
Hi @jasl , can you try switching to the
dev
branch of jetson-containers?Commit 712252b is staged there, which patches this issue introduced by recent transformers update.
# "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 118 # AttributeError: module 'torch.distributed' has no attribute 'is_initialized' RUN PYTHON_ROOT=`pip3 show transformers | grep Location: | cut -d' ' -f2` && \ sed -i 's|torch.distributed.is_initialized|torch.distributed.is_available|g' -i ${PYTHON_ROOT}/transformers/modeling_utils.py
Sorry for late response
I tried to use dev
branch, and the error move to
#7 [3/5] RUN cd /opt && git clone --branch master --depth=1 https://github.com/AUTOMATIC1111/stable-diffusion-webui && cd stable-diffusion-webui && git clone https://github.com/dusty-nv/stable-diffusion-webui-tensorrt extensions-builtin/stable-diffusion-webui-tensorrt && python3 -c 'from modules import launch_utils; launch_utils.args.skip_python_version_check=True; launch_utils.prepare_environment()'
#7 0.096 Cloning into 'stable-diffusion-webui'...
#7 2.022 Cloning into 'extensions-builtin/stable-diffusion-webui-tensorrt'...
#7 5.522 Traceback (most recent call last):
#7 5.522 File "<string>", line 1, in <module>
#7 5.522 File "/opt/stable-diffusion-webui/modules/launch_utils.py", line 356, in prepare_environment
#7 5.522 raise RuntimeError(
#7 5.522 RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
#7 5.522 Python 3.8.10 (default, May 26 2023, 14:05:08)
#7 5.522 [GCC 9.4.0]
#7 5.522 Version: v1.6.0
#7 5.522 Commit hash: 5ef669de080814067961f28357256e8fe27544f4
#7 ERROR: process "/bin/sh -c cd /opt && git clone --branch ${STABLE_DIFFUSION_WEBUI_VERSION} --depth=1 https://github.com/${STABLE_DIFFUSION_WEBUI_REPO} && cd stable-diffusion-webui && git clone https://github.com/dusty-nv/stable-diffusion-webui-tensorrt extensions-builtin/stable-diffusion-webui-tensorrt && python3 -c 'from modules import launch_utils; launch_utils.args.skip_python_version_check=True; launch_utils.prepare_environment()'" did not complete successfully: exit code: 1
------
> [3/5] RUN cd /opt && git clone --branch master --depth=1 https://github.com/AUTOMATIC1111/stable-diffusion-webui && cd stable-diffusion-webui && git clone https://github.com/dusty-nv/stable-diffusion-webui-tensorrt extensions-builtin/stable-diffusion-webui-tensorrt && python3 -c 'from modules import launch_utils; launch_utils.args.skip_python_version_check=True; launch_utils.prepare_environment()':
2.022 Cloning into 'extensions-builtin/stable-diffusion-webui-tensorrt'...
5.522 Traceback (most recent call last):
5.522 File "<string>", line 1, in <module>
5.522 File "/opt/stable-diffusion-webui/modules/launch_utils.py", line 356, in prepare_environment
5.522 raise RuntimeError(
5.522 RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
5.522 Python 3.8.10 (default, May 26 2023, 14:05:08)
5.522 [GCC 9.4.0]
5.522 Version: v1.6.0
5.522 Commit hash: 5ef669de080814067961f28357256e8fe27544f4
------
Dockerfile:17
--------------------
16 |
17 | >>> RUN cd /opt && \
18 | >>> git clone --branch ${STABLE_DIFFUSION_WEBUI_VERSION} --depth=1 https://github.com/${STABLE_DIFFUSION_WEBUI_REPO} && \
19 | >>> cd stable-diffusion-webui && \
20 | >>> git clone https://github.com/dusty-nv/stable-diffusion-webui-tensorrt extensions-builtin/stable-diffusion-webui-tensorrt && \
21 | >>> python3 -c 'from modules import launch_utils; launch_utils.args.skip_python_version_check=True; launch_utils.prepare_environment()'
22 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd /opt && git clone --branch ${STABLE_DIFFUSION_WEBUI_VERSION} --depth=1 https://github.com/${STABLE_DIFFUSION_WEBUI_REPO} && cd stable-diffusion-webui && git clone https://github.com/dusty-nv/stable-diffusion-webui-tensorrt extensions-builtin/stable-diffusion-webui-tensorrt && python3 -c 'from modules import launch_utils; launch_utils.args.skip_python_version_check=True; launch_utils.prepare_environment()'" did not complete successfully: exit code: 1
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/jasl/Workspaces/jetson-containers/jetson_containers/build.py", line 95, in <module>
build_container(args.name, args.packages, args.base, args.build_flags, args.simulate, args.skip_tests, args.test_only, args.push)
File "/home/jasl/Workspaces/jetson-containers/jetson_containers/container.py", line 128, in build_container
status = subprocess.run(cmd.replace(_NEWLINE_, ' '), executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker build --network=host --tag stable-diffusion-webui:r35.4.1-stable-diffusion-webui --file /home/jasl/Workspaces/jetson-containers/packages/diffusion/stable-diffusion-webui/Dockerfile --build-arg BASE_IMAGE=stable-diffusion-webui:r35.4.1-opencv /home/jasl/Workspaces/jetson-containers/packages/diffusion/stable-diffusion-webui 2>&1 | tee /home/jasl/Workspaces/jetson-containers/logs/20230912_010054/build/stable-diffusion-webui_r35.4.1-stable-diffusion-webui.txt; exit ${PIPESTATUS[0]}' returned non-zero exit status 1.
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
Do you have your default docker runtime set to nvidia? https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime
If so, my guess is that stable-diffusion-webui was updated and installs a different pytorch wheel that wasn't built with CUDA. I will kick off a workflow to check it...
https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime
Ah, sorry! I did a quick check. It's not using Nvidia runtime... I just reset my AGX Orin by JetPack SDK Manager.
I corrected the setting, but rerun I got the same error, I've cleaned the build cache and retrying.
If you haven't already, after you modify /etc/docker/daemon.json
, you need to either reboot or restart the docker service:
sudo systemctl restart docker
https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#docker-default-runtime
I did, after restart I still got the same error.
Hmm... it just re-built successfully here: https://github.com/dusty-nv/jetson-containers/actions/runs/6149838054/job/16686510903
You aren't using buildkit right?
Hmm... it just re-built successfully here: https://github.com/dusty-nv/jetson-containers/actions/runs/6149838054/job/16686510903
You aren't using buildkit right?
Ah yes, I added buildx and compose plugin
It seems buildx cause the trouble, I removed it and retry, sorry for waste your time.
May I ask an off-topic question? I'm actually trying to run https://github.com/vladmandic/automatic on the AGX Orin. This is an A111 web-ui fork. In my experience, it's more stable and has better diffusers and SDXL support.
but it's dropped Python 3.8 support, and Nvidia only provides cp38 prebuilt wheel. Is it possible to compile PyTorch on a higher Python version?
@jasl yes it should be possible to recompile PyTorch, PyTorch >= 2.0 doesn't require patches to build for ARM64.
However in my experience, you can simply do some patches to the setup.py/ect of a project, and it will still work on Python 3.8...
actually I already have to do this in my dockerfile for stable-diffusion-webui:
@jasl yes it should be possible to recompile PyTorch, PyTorch >= 2.0 doesn't require patches to build for ARM64.
However in my experience, you can simply do some patches to the setup.py/ect of a project, and it will still work on Python 3.8...
actually I already have to do this in my dockerfile for stable-diffusion-webui:
I see, thank you!
So the conclusion isdev
branch works but it has trouble when Docker is buildx
.
I heard Jetpack 6 will coming soon and I'm not sure buildx
will become the default in Ubuntu 22.04 packaged Docker
buildx / buildkit doesn't honor the default docker-runtime in /etc/docker/daemon.json
. hence for container builds that require CUDA runtime, it won't work
I'm trying to build my own
stable-diffusion-webui
image on AGX Orin (a fresh jetpack 5.1.2 installation),./build.sh stable-diffusion-webui
then I got