Fails to compile model in docker, missing installs?

Hello, I was able to get this repo working with sdxl turbo on my 4090 using a venv, but I tried to dockerize the build and it repeatedly fails on a missing tmp file. I am wonder if you have seen this issue before and if I perhaps am missing some apt or pip installs that are missing in my docker image causing this issue. Appreciate your help @chengzeyi.

Dockerfile: FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ --mount=type=cache,target=/var/lib/apt,sharing=locked \ apt-get -y update \ && apt-get install -y --no-install-recommends python3.10 python-is-python3 git libgl1 libsndfile1 pip ffmpeg google-perftools \ libvulkan1 libnvidia-gl-525-server mesa-vulkan-drivers gcc build-essential \ && apt-get autoremove -y \ && apt-get clean \ && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip setuptools wheel --no-cache-dir

WORKDIR testing COPY requirements2.txt requirements2.txt RUN pip install -r requirements2.txt --no-cache-dir

COPY . .

CMD ["python3", "examples/optimize_stable_diffusion_pipeline.py"]

Error logs:

INFO:root:Tracing forward /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:159: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! obj_type = tensors[start].item() /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:218: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:228: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:214: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bytes(tensors[start].tolist()), start + 1 /usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py:66: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz self.num_heads, tgt_len, src_len): /usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): /usr/local/lib/python3.10/dist-packages/transformers/models/clip/modeling_clip.py:313: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz self.num_heads, tgt_len, self.head_dim): /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:23: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return torch.tensor([num], dtype=torch.int64) /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:253: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return super().new(cls, x, args, kwargs) /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:123: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return (torch.as_tensor(tuple(obj), dtype=torch.uint8), ) 0%| | 0/30 [00:00<?, ?it/s]INFO:root:Dynamically graphing forward /usr/local/lib/python3.10/dist-packages/torch/cuda/graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:192.) super().capture_end() INFO:root:Tracing forward /usr/local/lib/python3.10/dist-packages/sfast/utils/flat_tensors.py:197: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bool(tensors[start].item()), start + 1 /usr/local/lib/python3.10/dist-packages/diffusers/models/unet_2d_condition.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: /usr/local/lib/python3.10/dist-packages/diffusers/models/resnet.py:265: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /usr/local/lib/python3.10/dist-packages/diffusers/models/resnet.py:271: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /usr/local/lib/python3.10/dist-packages/diffusers/models/resnet.py:173: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /usr/local/lib/python3.10/dist-packages/diffusers/models/resnet.py:186: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: /tmp/tmpuumtrxyy/main.c:4:10: fatal error: Python.h: No such file or directory 4 | #include | ^~~~~~ compilation terminated. 0%| | 0/30 [00:01<?, ?it/s] Traceback (most recent call last): File "/testing/examples/optimize_stable_diffusion_pipeline.py", line 81, in output_image = model(kwarg_inputs).images[0] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 918, in call noise_pred = self.unet( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py", line 29, in dynamic_graphed_callable cached_callable = simple_make_graphed_callable( File "/usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py", line 46, in simple_make_graphed_callable return make_graphed_callable(callable, File "/usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py", line 75, in make_graphed_callable callable(tree_copy(example_inputs), File "/usr/local/lib/python3.10/dist-packages/sfast/jit/trace_helper.py", line 55, in wrapper return traced_module(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/sfast/jit/trace_helper.py", line 112, in forward outputs = self.module(self.convert_inputs(args, kwargs)) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last):

graph(%input, %num_groups, %weight, %bias, %eps, %cudnn_enabled): %y : Tensor = sfast_triton::group_norm_silu(%input, %num_groups, %weight, %bias, %eps)


    return (%y)
RuntimeError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpuumtrxyy/main.c', '-O3', '-I/usr/local/lib/python3.10/dist-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmpuumtrxyy', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpuumtrxyy/group_norm_4d_channels_last_forward_collect_stats_kernel.cpython-310-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.

At:
  /usr/lib/python3.10/subprocess.py(369): check_call
  /usr/local/lib/python3.10/dist-packages/triton/common/build.py(90): _build
  /usr/local/lib/python3.10/dist-packages/triton/compiler/make_launcher.py(39): make_stub
  /usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py(425): compile
  <string>(63): group_norm_4d_channels_last_forward_collect_stats_kernel
  /usr/local/lib/python3.10/dist-packages/sfast/triton/__init__.py(35): new_func
  /usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py(232): run
  /usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py(232): run
  /usr/local/lib/python3.10/dist-packages/sfast/triton/ops/group_norm.py(437): group_norm_forward
  /usr/local/lib/python3.10/dist-packages/sfast/triton/torch_ops.py(186): forward
  /usr/local/lib/python3.10/dist-packages/torch/autograd/function.py(539): apply
  /usr/local/lib/python3.10/dist-packages/sfast/triton/torch_ops.py(224): group_norm_silu
  /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
  /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /usr/local/lib/python3.10/dist-packages/sfast/jit/trace_helper.py(112): forward
  /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
  /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /usr/local/lib/python3.10/dist-packages/sfast/jit/trace_helper.py(55): wrapper
  /usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py(75): make_graphed_callable
  /usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py(46): simple_make_graphed_callable
  /usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py(29): dynamic_graphed_callable
  /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
  /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py(918): __call__
  /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
  /testing/examples/optimize_stable_diffusion_pipeline.py(81): <module>

Pip Freeze:
accelerate==0.25.0
annotated-types==0.6.0
anyio==3.7.1
certifi==2023.11.17
charset-normalizer==3.3.2
click==8.1.7
diffusers==0.24.0
exceptiongroup==1.2.0
fastapi==0.104.1
filelock==3.13.1
fsspec==2023.12.0
h11==0.14.0
huggingface-hub==0.19.4
idna==3.6
importlib-metadata==7.0.0
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
packaging==23.2
Pillow==10.1.0
psutil==5.9.6
pydantic==2.5.2
pydantic_core==2.14.5
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.1
sniffio==1.3.0
stable-fast @ https://github.com/chengzeyi/stable-fast/releases/download/v0.0.12.post6/stable_fast-0.0.12.post6+torch210cu121-cp310-cp310-manylinux2014_x86_64.whl
starlette==0.27.0
sympy==1.12
tokenizers==0.15.0
torch==2.1.0
tqdm==4.66.1
transformers==4.35.2
triton==2.1.0
typing_extensions==4.8.0
urllib3==2.1.0
uvicorn==0.24.0.post1
xformers==0.0.22.post7
zipp==3.17.0

chengzeyi / stable-fast

Fails to compile model in docker, missing installs? #62