chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.06k stars 60 forks source link

[composability] stable-fast + sd-turbo device mismatch #69

Closed jon-chuang closed 7 months ago

jon-chuang commented 7 months ago
2023-12-07 17:35:10.128 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
2023-12-07 17:35:10.128 [stderr   ]     return func(*args, **kwargs)
2023-12-07 17:35:10.128 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py", line 926, in __call__
2023-12-07 17:35:10.128 [stderr   ]     image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False, generator=generator)[
2023-12-07 17:35:10.128 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 36, in dynamic_graphed_callable
2023-12-07 17:35:10.128 [stderr   ]     cached_callable = simple_make_graphed_callable(
2023-12-07 17:35:10.128 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 53, in simple_make_graphed_callable
2023-12-07 17:35:10.128 [stderr   ]     return make_graphed_callable(callable,
2023-12-07 17:35:10.128 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 98, in make_graphed_callable
2023-12-07 17:35:10.129 [stderr   ]     static_inputs = shadow_copy(static_inputs_)
2023-12-07 17:35:10.129 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/sfast/utils/copy.py", line 49, in shadow_copy
2023-12-07 17:35:10.129 [stderr   ]     return type(obj)(shadow_copy(x, detach=detach) for x in obj)
2023-12-07 17:35:10.129 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/sfast/utils/copy.py", line 49, in <genexpr>
2023-12-07 17:35:10.129 [stderr   ]     return type(obj)(shadow_copy(x, detach=detach) for x in obj)
2023-12-07 17:35:10.129 [stderr   ]   File "/root/.cache/isolate/virtualenv/cb0c4d3222905a6bb1bceaa9f8e4dae878a144d056a139f4dbce875ede43363e/lib/python3.10/site-packages/sfast/utils/copy.py", line 45, in shadow_copy
2023-12-07 17:35:10.129 [stderr   ]     return sfast._C._create_shadow_tensor(
2023-12-07 17:35:10.129 [stderr   ] RuntimeError: The specified pointer resides on host memory and is not registered with any CUDA device.
jon-chuang commented 7 months ago

Ok, it seems to only occur when steps=1. Works with steps=2

jon-chuang commented 7 months ago

I this is expected. Strength=0.6 with steps=1 is 0 steps. So the latent will somehow be on CPU, thus leading to crash