chengzeyi / stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
MIT License
1.18k stars 72 forks source link

How to input fixed latents to the model during inference #17

Closed hedgehoggy closed 11 months ago

hedgehoggy commented 12 months ago

Hi, I would like to know how to input fixed latents to the model during inference. Because I need fixed latents to avoid random noise for comparing the results of stable fast and PyTorch. I have set the input parameters to the following format, but it doesn't work.

`latents = np.load('latents.npy', mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII') latents = torch.from_numpy(latents).half().cuda()

kwarg_inputs = dict( prompt = prompt, latents = latents, height=512, width=512, num_inference_steps=20, num_images_per_prompt=1, ) output_image = compiled_model(**kwarg_inputs).images[0]

chengzeyi commented 12 months ago

Hi, I would like to know how to input fixed latents to the model during inference. Because I need fixed latents to avoid random noise for comparing the results of stable fast and PyTorch. I have set the input parameters to the following format, but it doesn't work.

`latents = np.load('latents.npy', mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII') latents = torch.from_numpy(latents).half().cuda()

kwarg_inputs = dict( prompt = prompt, latents = latents, height=512, width=512, num_inference_steps=20, num_images_per_prompt=1, ) output_image = compiled_model(**kwarg_inputs).images[0]

Yes, you set the same latents. But you should also set the generator, as well as keep the state of the generator the same across different invocations. Typically speaking, this should be a problem with diffusers, not stable-fast.

hedgehoggy commented 12 months ago

OK, I will try it later. Apart from this, when I use SDXL, the following error occurred.

Traceback (most recent call last): File "/home/sd/stable-fast-main/run_sdxl.py", line 84, in output_image = compiled_model(kwarg_inputs).images[0] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 749, in call ) = self.encode_prompt( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 309, in encode_prompt prompt_embeds = text_encoder( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/sd/stable-fast-main/sfast/jit/trace_helper.py", line 55, in wrapper traced_m, call_helper = trace_with_kwargs( File "/home/sd/stable-fast-main/sfast/jit/trace_helper.py", line 30, in trace_with_kwargs traced_module = better_trace(TraceablePosArgOnlyModuleWrapper(func), File "/home/sd/stable-fast-main/sfast/jit/utils.py", line 30, in better_trace script_module = torch.jit.trace(func, args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 798, in trace return trace_module( File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 1065, in trace_module module._c._create_method_from_trace( RuntimeError: Tracer cannot infer type of BaseModelOutputWithPooling( xxxxxxxxxxxxx :Dictionary inputs to traced functions must have consistent type. Found Tensor and Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]

chengzeyi commented 12 months ago

Hi, I would like to know how to input fixed latents to the model during inference. Because I need fixed latents to avoid random noise for comparing the results of stable fast and PyTorch. I have set the input parameters to the following format, but it doesn't work.

`latents = np.load('latents.npy', mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII') latents = torch.from_numpy(latents).half().cuda()

kwarg_inputs = dict( prompt = prompt, latents = latents, height=512, width=512, num_inference_steps=20, num_images_per_prompt=1, ) output_image = compiled_model(**kwarg_inputs).images[0]

Refer to one of my test files test_stable_diffusion_pipeline_compiler.py to see how I achieve that.

chengzeyi commented 12 months ago

OK, I will try it later. Apart from this, when I use SDXL, the following error occurred.

Traceback (most recent call last): File "/home/sd/stable-fast-main/run_sdxl.py", line 84, in output_image = compiled_model(kwarg_inputs).images[0] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 749, in call* ) = self.encode_prompt( File "/usr/local/lib/python3.10/dist-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py", line 309, in encode_prompt prompt_embeds = text_encoder( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/home/sd/stable-fast-main/sfast/jit/trace_helper.py", line 55, in wrapper traced_m, call_helper = trace_with_kwargs( File "/home/sd/stable-fast-main/sfast/jit/trace_helper.py", line 30, in trace_with_kwargs traced_module = better_trace(TraceablePosArgOnlyModuleWrapper(func), File "/home/sd/stable-fast-main/sfast/jit/utils.py", line 30, in better_trace script_module = torch.jit.trace(func, args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 798, in trace return trace_module( File "/usr/local/lib/python3.10/dist-packages/torch/jit/_trace.py", line 1065, in trace_module module._c._create_method_from_trace( RuntimeError: Tracer cannot infer type of BaseModelOutputWithPooling( xxxxxxxxxxxxx :Dictionary inputs to traced functions must have consistent type. Found Tensor and Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]

This should have been fixed in the latest version. Upgrade your code to see if it gets solved.