YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
https://proceedings.mlr.press/v235/yang24ai.html
MIT License
1.7k stars 99 forks source link

Matrix mismatch error in the RegionalDiffusion_playground.ipynb #4

Closed threefoldo closed 10 months ago

threefoldo commented 10 months ago

I could run the RPG.py file in command line and generated 4 images, but not the notebook. It seems like they are different implementation.

I encountered an error in this line:

images = pipe(prompt,negative_prompt,
              batch_size = 2, #batch size
              num_inference_steps=30, # sampling step
              height = 896, 
              width = 640, 
              end_steps = 1, # The number of steps to end the attention double version (specified in a ratio of 0-1. If it is 1, attention double version will be applied in all steps, with 0 being the normal generation)
              base_ratio=0.2, # Base ratio, the weight of base prompt, if 0, all are regional prompts, if 1, all are base prompts
              seed = 4396, # random seed
)
File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/xformers/ops/fmha/__init__.py:334, in _memory_efficient_attention_forward(inp, op)
    331 def _memory_efficient_attention_forward(
    332     inp: Inputs, op: Optional[Type[AttentionFwOpBase]]
    333 ) -> torch.Tensor:
--> 334     inp.validate_inputs()
    335     output_shape = inp.normalize_bmhk()
    336     if op is None:

File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/xformers/ops/fmha/common.py:197, in Inputs.validate_inputs(self)
    191     valid_shapes = (
    192         self.query.shape == (B, Mq, G, H, K)
    193         and self.key.shape == (B, Mkv, G, H, key_embed_dim)
    194         and self.value.shape == (B, Mkv, G, H, Kv)
    195     )
    196 if not valid_shapes:
--> 197     raise ValueError(
    198         f"Incompatible shapes for attention inputs:\n"
    199         f"  query.shape: {self.query.shape}\n"
    200         f"  key.shape  : {self.key.shape}\n"
    201         f"  value.shape: {self.value.shape}\n"
    202         "HINT: We don't support broadcasting, please use `expand` "
    203         "yourself before calling `memory_efficient_attention` if you need to"
    204     )

ValueError: Incompatible shapes for attention inputs:
  query.shape: torch.Size([32, 8960, 40])
  key.shape  : torch.Size([64, 51, 40])
  value.shape: torch.Size([64, 51, 40])
HINT: We don't support broadcasting, please use `expand` yourself before calling `memory_efficient_attention` if you need to

If I disable efficient attention,

self.unet.set_use_memory_efficient_attention_xformers(False)

The error still exists:

File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/diffusers/models/attention_processor.py:406, in AttnProcessor.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask)
    403 key = attn.head_to_batch_dim(key)
    404 value = attn.head_to_batch_dim(value)
--> 406 attention_probs = attn.get_attention_scores(query, key, attention_mask)
    407 hidden_states = torch.bmm(attention_probs, value)
    408 hidden_states = attn.batch_to_head_dim(hidden_states)

File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/diffusers/models/attention_processor.py:308, in Attention.get_attention_scores(self, query, key, attention_mask)
    305     baddbmm_input = attention_mask
    306     beta = 1
--> 308 attention_scores = torch.baddbmm(
    309     baddbmm_input,
    310     query,
    311     key.transpose(-1, -2),
    312     beta=beta,
    313     alpha=self.scale,
    314 )
    316 if self.upcast_softmax:
    317     attention_scores = attention_scores.float()

RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [32, 40] but got: [64, 40].
xhinker commented 10 months ago

see the same error

YangLing0818 commented 10 months ago

see the same error

Thanks for your kind reminder,we will fix this problem in a while and please try RPG.py first

YangLing0818 commented 10 months ago

I could run the RPG.py file in command line and generated 4 images, but not the notebook. It seems like they are different implementation.

I encountered an error in this line:

images = pipe(prompt,negative_prompt,
              batch_size = 2, #batch size
              num_inference_steps=30, # sampling step
              height = 896, 
              width = 640, 
              end_steps = 1, # The number of steps to end the attention double version (specified in a ratio of 0-1. If it is 1, attention double version will be applied in all steps, with 0 being the normal generation)
              base_ratio=0.2, # Base ratio, the weight of base prompt, if 0, all are regional prompts, if 1, all are base prompts
              seed = 4396, # random seed
)
File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/xformers/ops/fmha/__init__.py:334, in _memory_efficient_attention_forward(inp, op)
    331 def _memory_efficient_attention_forward(
    332     inp: Inputs, op: Optional[Type[AttentionFwOpBase]]
    333 ) -> torch.Tensor:
--> 334     inp.validate_inputs()
    335     output_shape = inp.normalize_bmhk()
    336     if op is None:

File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/xformers/ops/fmha/common.py:197, in Inputs.validate_inputs(self)
    191     valid_shapes = (
    192         self.query.shape == (B, Mq, G, H, K)
    193         and self.key.shape == (B, Mkv, G, H, key_embed_dim)
    194         and self.value.shape == (B, Mkv, G, H, Kv)
    195     )
    196 if not valid_shapes:
--> 197     raise ValueError(
    198         f"Incompatible shapes for attention inputs:\n"
    199         f"  query.shape: {self.query.shape}\n"
    200         f"  key.shape  : {self.key.shape}\n"
    201         f"  value.shape: {self.value.shape}\n"
    202         "HINT: We don't support broadcasting, please use `expand` "
    203         "yourself before calling `memory_efficient_attention` if you need to"
    204     )

ValueError: Incompatible shapes for attention inputs:
  query.shape: torch.Size([32, 8960, 40])
  key.shape  : torch.Size([64, 51, 40])
  value.shape: torch.Size([64, 51, 40])
HINT: We don't support broadcasting, please use `expand` yourself before calling `memory_efficient_attention` if you need to

If I disable efficient attention,

self.unet.set_use_memory_efficient_attention_xformers(False)

The error still exists:

File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/diffusers/models/attention_processor.py:406, in AttnProcessor.__call__(self, attn, hidden_states, encoder_hidden_states, attention_mask)
    403 key = attn.head_to_batch_dim(key)
    404 value = attn.head_to_batch_dim(value)
--> 406 attention_probs = attn.get_attention_scores(query, key, attention_mask)
    407 hidden_states = torch.bmm(attention_probs, value)
    408 hidden_states = attn.batch_to_head_dim(hidden_states)

File ~/anaconda3/envs/rpg/lib/python3.9/site-packages/diffusers/models/attention_processor.py:308, in Attention.get_attention_scores(self, query, key, attention_mask)
    305     baddbmm_input = attention_mask
    306     beta = 1
--> 308 attention_scores = torch.baddbmm(
    309     baddbmm_input,
    310     query,
    311     key.transpose(-1, -2),
    312     beta=beta,
    313     alpha=self.scale,
    314 )
    316 if self.upcast_softmax:
    317     attention_scores = attention_scores.float()

RuntimeError: Expected size for first two dimensions of batch2 tensor to be: [32, 40] but got: [64, 40].

The problem is caused by the incorrect version of diffuser library, we have updated RegionalDiffusion_playground.ipynb, please check.

xhinker commented 10 months ago

@YangLing0818 Thanks Yang, which Diffusers version should be used? I pulled the latest code and the error persisted.

xhinker commented 10 months ago

I see the version number, thanks

xhinker commented 10 months ago

I see another issue, seems I can only use the "Linaqruf/anything-v3.0" switch to other model will report

AttributeError: 'AttentionBlock' object has no attribute 'to_k'