Closed geyanqi closed 6 months ago
Yes, for convenience, we have used generated content images. We have also tested with real images and can achieve good results when the image quality is acceptable, as shown in the picture below. We will update and fix bugs in the next few days. Thank you for your @suggestion.
Thank you for your reply.
But the purpose of transferring the style of generated images is unclear. It is known that style transfer of generated images is relatively easy. More commonly used real images may need to be checked, such as those in imagenet-R dataset.
In addition, can you explain the second row of eq4 further?
We adopted a similar evaluation method as referenced in other papers. Eq4 represents the process of downsampling the content images without adding noise.
The bug still exists and I get the following return,
Traceback (most recent call last):
File "stable_diffusion_xl_test.py", line 434, in <module>
test(args)
File "stable_diffusion_xl_test.py", line 312, in test
outputs = pipeline(
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 818, in __call__
) = self.encode_prompt(
File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 348, in encode_prompt
prompt_embeds = text_encoder(
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 798, in forward
return self.text_model(
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 703, in forward
encoder_outputs = self.encoder(
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward
layer_outputs = encoder_layer(
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward
hidden_states = self.layer_norm1(hidden_states)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 201, in forward
return F.layer_norm(
File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
The bug still exists and I get the following return,
Traceback (most recent call last): File "stable_diffusion_xl_test.py", line 434, in <module> test(args) File "stable_diffusion_xl_test.py", line 312, in test outputs = pipeline( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 818, in __call__ ) = self.encode_prompt( File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 348, in encode_prompt prompt_embeds = text_encoder( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 798, in forward return self.text_model( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 703, in forward encoder_outputs = self.encoder( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward layer_outputs = encoder_layer( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward hidden_states = self.layer_norm1(hidden_states) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 201, in forward return F.layer_norm( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
This problem seems to be related to my gpu-driver. But can you provide a gpu driver-friendly installation method?
We adopted a similar evaluation method as referenced in other papers. Eq4 represents the process of downsampling the content images without adding noise.
So, how to input the content image embedding into unet-encoder? The original Unet-encoder needs text embedding and latent embedding to work together for cross-attention.
The bug still exists and I get the following return,
Traceback (most recent call last): File "stable_diffusion_xl_test.py", line 434, in <module> test(args) File "stable_diffusion_xl_test.py", line 312, in test outputs = pipeline( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 818, in __call__ ) = self.encode_prompt( File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 348, in encode_prompt prompt_embeds = text_encoder( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 798, in forward return self.text_model( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 703, in forward encoder_outputs = self.encoder( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward layer_outputs = encoder_layer( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward hidden_states = self.layer_norm1(hidden_states) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 201, in forward return F.layer_norm( File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
This problem seems to be related to my gpu-driver. But can you provide a gpu driver-friendly installation method?
Thank you very much for your interest in our work. This problem may be caused by your GPU driver being too old, it is recommended to upgrade the GUP driver from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. We conducted tests on two different servers. The GPU driver version 535.154.05-0ubuntu0.20.04.1 operates normally, while version 515.105.01-0ubuntu1 encounters the same error, the GPU we use is V100. Therefore, it is strongly recommended to upgrade the corresponding GPU drivers based on your GPU type.
We adopted a similar evaluation method as referenced in other papers. Eq4 represents the process of downsampling the content images without adding noise.
So, how to input the content image embedding into unet-encoder? The original Unet-encoder needs text embedding and latent embedding to work together for cross-attention.
As shown in Figure 2 of the paper, our FreeStyle consists of two visual encoder branches, where the lower branch takes the clean content image as input.
However the lower branch in the code also enters text embedding.
https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1007, https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1018
Furthermore, I think you can mark out the key parts of the code in the README file to help others quickly understand your work. If I understand correctly, this part should be in:
upper branch:
lower branch:
https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1018 where skip_stach is the upper branch's return
https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L97 use skip_stach
Thank you very much for your valuable suggestions. We will consider including hints about the principal modification spots in the README of subsequent versions. Upon reviewing the specific locations you mentioned, it's clear that modifications have indeed been made there.
Furthermore, I think you can mark out the key parts of the code in the README file to help others quickly understand your work. If I understand correctly, this part should be in:此外,我认为您可以在 README 文件中标记出代码的关键部分,以帮助其他人快速理解您的工作。如果我理解正确,这部分应该在:
upper branch: 上分支:
- https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1007 where skip_stack == None其中 skip_stack == 无
- https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L976 return the unet encoder features to skip_stack,将 UNET 编码器功能返回给 skip_stack,
lower branch: 下分支:
- https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1018 where skip_stach is the upper branch's return其中 skip_stach 是上分支的返回
- https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L97 use skip_stach 使用skip_stach
- https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_blocks.py#L2182 core code. 核心代码。
Thank you for highlighting the key parts of the code
The content images appear to be generated directly from SDXL. Can you provide some transfer results on real images with irregularities? For example, there are no restrictions on resolution or image quality.
and the second row of eq4 is hard to understand. how do input an image without a text prompt into diffusion models?
In addition, the code has so many bugs that need to be fixed.