FreeStyleFreeLunch / FreeStyle

FreeStyle : Free Lunch for Text-guided Style Transfer using Diffusion Models
109 stars 7 forks source link

content image #2

Closed geyanqi closed 6 months ago

geyanqi commented 8 months ago

The content images appear to be generated directly from SDXL. Can you provide some transfer results on real images with irregularities? For example, there are no restrictions on resolution or image quality.

and the second row of eq4 is hard to understand. how do input an image without a text prompt into diffusion models?

In addition, the code has so many bugs that need to be fixed.

FreeStyleFreeLunch commented 8 months ago

Yes, for convenience, we have used generated content images. We have also tested with real images and can achieve good results when the image quality is acceptable, as shown in the picture below. We will update and fix bugs in the next few days. Thank you for your @suggestion. image

geyanqi commented 8 months ago

Thank you for your reply.

But the purpose of transferring the style of generated images is unclear. It is known that style transfer of generated images is relatively easy. More commonly used real images may need to be checked, such as those in imagenet-R dataset.

In addition, can you explain the second row of eq4 further?

FreeStyleFreeLunch commented 8 months ago

We adopted a similar evaluation method as referenced in other papers. Eq4 represents the process of downsampling the content images without adding noise.

geyanqi commented 8 months ago

The bug still exists and I get the following return,

Traceback (most recent call last):
  File "stable_diffusion_xl_test.py", line 434, in <module>
    test(args)
  File "stable_diffusion_xl_test.py", line 312, in test
    outputs = pipeline(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 818, in __call__
    ) = self.encode_prompt(
  File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 348, in encode_prompt
    prompt_embeds = text_encoder(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 798, in forward
    return self.text_model(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 703, in forward
    encoder_outputs = self.encoder(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward
    layer_outputs = encoder_layer(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 201, in forward
    return F.layer_norm(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
geyanqi commented 8 months ago

The bug still exists and I get the following return,

Traceback (most recent call last):
  File "stable_diffusion_xl_test.py", line 434, in <module>
    test(args)
  File "stable_diffusion_xl_test.py", line 312, in test
    outputs = pipeline(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 818, in __call__
    ) = self.encode_prompt(
  File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 348, in encode_prompt
    prompt_embeds = text_encoder(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 798, in forward
    return self.text_model(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 703, in forward
    encoder_outputs = self.encoder(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward
    layer_outputs = encoder_layer(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 201, in forward
    return F.layer_norm(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

This problem seems to be related to my gpu-driver. But can you provide a gpu driver-friendly installation method?

geyanqi commented 8 months ago

We adopted a similar evaluation method as referenced in other papers. Eq4 represents the process of downsampling the content images without adding noise.

So, how to input the content image embedding into unet-encoder? The original Unet-encoder needs text embedding and latent embedding to work together for cross-attention.

ucasligang commented 8 months ago

The bug still exists and I get the following return,

Traceback (most recent call last):
  File "stable_diffusion_xl_test.py", line 434, in <module>
    test(args)
  File "stable_diffusion_xl_test.py", line 312, in test
    outputs = pipeline(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 818, in __call__
    ) = self.encode_prompt(
  File "/data0/geyanqi/stylefree/diffusers_test/pipeline_stable_diffusion_xl.py", line 348, in encode_prompt
    prompt_embeds = text_encoder(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 798, in forward
    return self.text_model(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 703, in forward
    encoder_outputs = self.encoder(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 630, in forward
    layer_outputs = encoder_layer(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 371, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 201, in forward
    return F.layer_norm(
  File "/data0/geyanqi/miniconda3/envs/stylefree/lib/python3.8/site-packages/torch/nn/functional.py", line 2546, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

This problem seems to be related to my gpu-driver. But can you provide a gpu driver-friendly installation method?

Thank you very much for your interest in our work. This problem may be caused by your GPU driver being too old, it is recommended to upgrade the GUP driver from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. We conducted tests on two different servers. The GPU driver version 535.154.05-0ubuntu0.20.04.1 operates normally, while version 515.105.01-0ubuntu1 encounters the same error, the GPU we use is V100. Therefore, it is strongly recommended to upgrade the corresponding GPU drivers based on your GPU type.

ucasligang commented 8 months ago

We adopted a similar evaluation method as referenced in other papers. Eq4 represents the process of downsampling the content images without adding noise.

So, how to input the content image embedding into unet-encoder? The original Unet-encoder needs text embedding and latent embedding to work together for cross-attention. image

As shown in Figure 2 of the paper, our FreeStyle consists of two visual encoder branches, where the lower branch takes the clean content image as input.

geyanqi commented 8 months ago

However the lower branch in the code also enters text embedding.

https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1007, https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1018

geyanqi commented 8 months ago

Furthermore, I think you can mark out the key parts of the code in the README file to help others quickly understand your work. If I understand correctly, this part should be in:

upper branch:

  1. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1007 where skip_stack == None
  2. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L976 return the unet encoder features to skip_stack,

lower branch:

  1. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1018 where skip_stach is the upper branch's return

  2. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L97 use skip_stach

  3. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_blocks.py#L2182 core code.

FreeStyleFreeLunch commented 8 months ago

Thank you very much for your valuable suggestions. We will consider including hints about the principal modification spots in the README of subsequent versions. Upon reviewing the specific locations you mentioned, it's clear that modifications have indeed been made there.

shiyizhiyuanla commented 6 months ago

Furthermore, I think you can mark out the key parts of the code in the README file to help others quickly understand your work. If I understand correctly, this part should be in:此外,我认为您可以在 README 文件中标记出代码的关键部分,以帮助其他人快速理解您的工作。如果我理解正确,这部分应该在:

upper branch: 上分支:

  1. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1007 where skip_stack == None其中 skip_stack == 无
  2. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L976 return the unet encoder features to skip_stack,将 UNET 编码器功能返回给 skip_stack,

lower branch: 下分支:

  1. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers_test/pipeline_stable_diffusion_xl.py#L1018 where skip_stach is the upper branch's return其中 skip_stach 是上分支的返回
  2. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_condition.py#L97 use skip_stach 使用skip_stach
  3. https://github.com/FreeStyleFreeLunch/FreeStyle/blob/main/diffusers/src/diffusers/models/unet_2d_blocks.py#L2182 core code. 核心代码。

Thank you for highlighting the key parts of the code