Amshaker / unetr_plus_plus

[IEEE TMI-2024] UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
Apache License 2.0
340 stars 32 forks source link

Some code and paper questions about EPA #62

Closed YifengDeng1 closed 9 months ago

YifengDeng1 commented 9 months ago
  1. in class EPA part, these is a line: x_SA = (attn_SA @ v_SA_projected.transpose(-2, -1)).permute(0, 3, 1, 2).reshape(B, N, C) after read your paper and code, I think it should be: x_SA = (attn_SA @ v_SA_projected.transpose(-2, -1)).permute(0, 2, 1, 3).reshape(B, N, C) Though these two outputs share same shape and size, they are still different.

  2. still in EPA part: the way to compute x_CA in the repository is not exactly the same as it in the paper. in the repository: x_CA = (attn_CA @ v_CA).permute(0, 3, 1, 2).reshape(B, N, C) while it should be this way according to the paper: x_CA = ( v_CA.transpose(-2, -1)@attn_CA).permute(0, 2, 1, 3).reshape(B, N, C) However, the method in the repository looks more easy to understand and logical when compared to the paper.

If I make something wrong, please let me know, thanks!

Amshaker commented 9 months ago

Hi @YifengDeng1 ,

Thanks for your question and apologize for the late reply.

(1) Yes, you are right. Permuting the dimension should be (0, 3, 1, 2) to keep the consistency. (2) I think both are fine. However, I applied one experiment with the proposed two modifications on Synapse and the output is almost similar (87.1 vs 87.2). I will clarify that in the next version.

Best regards, Abdelrahman.

YifengDeng1 commented 7 months ago

Hi@Amshaker,

Thanks for your reply! Still a little confusion.

(1)the dimension should be (0, 2, 1, 3), not (0, 3, 1, 2),right? (2) I totally undestand it, thanks!

Sincerely, Yifeng.