Possible issue with Multihead attention inputs

VolodymyrAhafonov commented 11 months ago

Hello. Really good and interesting paper. Thank you for your work!

Recently I've experimented with your model an noticed interesting details in code of AEALblock class. Code block:

b, c, h, w = src.shape
x1 = src[:, 0:c // 2]
x1_ = rearrange(x1, 'b c (h1 h2) w -> b c h1 h2 w', h2=self.width)
x1_ = rearrange(x1_, 'b c h1 h2 w -> (b h1) (h2 w) c')
x2 = src[:, c // 2:]
x2_ = rearrange(x2, 'b c h (w1 w2) -> b c h w1 w2', w2=self.width)
x2_ = rearrange(x2_, 'b c h w1 w2 -> (b w1) (h w2) c')

It seems to me that x1_ and x2_ are prepared in batch_first manner. But further they are processed via nn.MultiheadAttention module that is always initialized in batch_first=False mode so (b h1) from x1_ and (b w1) from x2_ are treated as sequence dimension. This seems wrong to me. Could you please clarify if this is bug or not? Thanks in advance.

Windaway commented 11 months ago

Thank you for your attention, there is indeed a bug with the implementation here. I need to retrain the model to access the performance. The current implementation seems to be a strong regularization layer.

VolodymyrAhafonov commented 11 months ago

Thank you for your quick reply. As I understood from your answer, currently shared trained model was trained with this bug? If so, then could you please train model without this bug and share it?

Windaway commented 11 months ago

Sure, I will fix this bug, retrain, and release the model. However, you can still use the current model for now, and it also works well. BTW, you may use this branch https://github.com/QLYoo/AEMatter/tree/PT20.

Windaway commented 11 months ago

The issue has been fixed and I provided ckpt trained with the fixed codes. The new ckpt achieves better performance on Adobe Composition-1K.

aipixel / AEMatter

Possible issue with Multihead attention inputs #3