Cant get maxperf running

Hi, thanks for great project, but I am struggling to get it running.

I have nvidia driver 530, cuda 12.1, torch 2.1.0, python 3.1, xformers 0.22, like I would say all met...

stable fast I installed version corresponding to my setup via pip3: https://github.com/chengzeyi/stable-fast/releases/download/v0.0.13.post3/stable_fast-0.0.13.post3+torch210cu121-cp310-cp310-manylinux2014_x86_64.whl

yet when I try to run it I get such nasty error which I really dont know where to start and what can be wrong as it looks like it is comming from stable-fast and not really sure what to do.

Anyone any clue what might be wrong?

(venv) sd@sd:~/Playground/ArtSpew$ python3 maxperf.py Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 15.28it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/cuda/graphs.py:88: UserWarning: The CUDA Graph is empty. This usually means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at ../aten/src/ATen/cuda/CUDAGraph.cpp:192.) super().capture_end() /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:159: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! obj_type = tensors[start].item() /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:218: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:228: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! size = tensors[start].item() /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:214: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bytes(tensors[start].tolist()), start + 1 /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:66: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py:137: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:273: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz self.num_heads, tgt_len, src_len): /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:281: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:313: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz self.num_heads, tgt_len, self.head_dim): /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:23: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return torch.tensor([num], dtype=torch.int64) /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:253: TracerWarning: torch.Tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return super().new(cls, x, *args, kwargs) /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:123: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. return (torch.as_tensor(tuple(obj), dtype=torch.uint8), ) /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_euler_discrete.py:353: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results. if len(index_candidates) > 1: /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/schedulers/scheduling_euler_discrete.py:358: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! self._step_index = step_index.item() /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/utils/flat_tensors.py:197: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! return bool(tensors[start].item()), start + 1 /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:878: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:265: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:271: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:173: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert hidden_states.shape[1] == self.channels /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/models/resnet.py:186: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if hidden_states.shape[0] >= 64: /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda /usr/bin/ld: cannot find -lcuda: No such file or directory /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda /usr/bin/ld: skipping incompatible /lib/i386-linux-gnu/libcuda.so when searching for -lcuda collect2: error: ld returned 1 exit status Traceback (most recent call last): File "/home/sd/Playground/ArtSpew/maxperf.py", line 256, in mw = MainWindow() File "/home/sd/Playground/ArtSpew/maxperf.py", line 177, in init self.genImage() File "/home/sd/Playground/ArtSpew/maxperf.py", line 209, in genImage images = genit(0, prompts=prompts, batchSize=batchSize, nSteps=1) File "/home/sd/Playground/ArtSpew/maxperf.py", line 234, in genit images = pipe( File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 918, in call noise_pred = self.unet( File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 29, in dynamic_graphed_callable cached_callable = simple_make_graphed_callable( File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 46, in simple_make_graphed_callable return make_graphed_callable(callable, File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 75, in make_graphed_callable callable(tree_copy(example_inputs), File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 62, in wrapper return traced_module(args, kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 119, in forward outputs = self.module(self.convert_inputs(args, kwargs)) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last):

graph(%input, %num_groups, %weight, %bias, %eps, %cudnn_enabled): %y : Tensor = sfast_triton::group_norm_silu(%input, %num_groups, %weight, %bias, %eps)


    return (%y)
RuntimeError: CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp264mff0j/main.c', '-O3', '-I/home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.10', '-I/tmp/tmp264mff0j', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp264mff0j/group_norm_4d_channels_last_forward_collect_stats_kernel.cpython-310-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-L/lib/i386-linux-gnu']' returned non-zero exit status 1.

At:
  /usr/lib/python3.10/subprocess.py(369): check_call
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/common/build.py(103): _build
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/compiler/make_launcher.py(37): make_stub
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/compiler/compiler.py(614): compile
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/runtime/jit.py(532): run
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/triton/__init__.py(35): new_func
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/runtime/autotuner.py(305): run
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/triton/runtime/autotuner.py(305): run
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/triton/ops/group_norm.py(425): group_norm_forward
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/triton/torch_ops.py(188): forward
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/autograd/function.py(539): apply
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/triton/torch_ops.py(226): group_norm_silu
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/jit/trace_helper.py(119): forward
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/jit/trace_helper.py(62): wrapper
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/cuda/graphs.py(75): make_graphed_callable
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/cuda/graphs.py(46): simple_make_graphed_callable
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/sfast/cuda/graphs.py(29): dynamic_graphed_callable
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py(918): __call__
  /home/sd/Playground/ArtSpew/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py(115): decorate_context
  /home/sd/Playground/ArtSpew/maxperf.py(234): genit
  /home/sd/Playground/ArtSpew/maxperf.py(209): genImage
  /home/sd/Playground/ArtSpew/maxperf.py(177): __init__
  /home/sd/Playground/ArtSpew/maxperf.py(256): <module>

aifartist / ArtSpew

Cant get maxperf running #10