RUN examples/05_stable_diffusion/compile.py ERROR

deltaguo commented 12 months ago

I created the ROCM image according to the instructions and installed AITemplate in the container, but when I executed examples/05_stable_diffusion/compile.py, I was prompted that the nvidia driver was missing?

INFO:aitemplate.testing.detect_target:Set target to ROCM
vae/diffusion_pytorch_model.safetensors not found
Cannot initialize model with low cpu memory usage because `accelerate` was not found in the environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install `accelerate` for faster and less memory-intense model loading. You can do so with: 
`
pip install accelerate
`
.
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████| 5/5 [00:13<00:00,  2.63s/it]Traceback (most recent call last):
  File "examples/05_stable_diffusion/compile.py", line 379, in <module>
    compile_diffusers()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "examples/05_stable_diffusion/compile.py", line 342, in compile_diffusers
    pipe = StableDiffusionPipeline.from_pretrained(
  File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/pipeline_utils.py", line 733, in to
    module.to(torch_device, torch_dtype)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Boom-Hacker commented 12 months ago

is your torch installed by "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6"?

deltaguo commented 12 months ago

I am using a container built by DOCKER_BUILDKIT=1 ./docker/build.sh rocm, which already contains pytorch. The difference from README.md is that I used podman's container. I don't know if this will cause exceptions.

deltaguo commented 12 months ago

I tried to configure the environment base on docker.io/rocm/pytorch, and the above errors did not occur.

ROCm / AITemplate

RUN examples/05_stable_diffusion/compile.py ERROR #75