floriankark / cs224n-win2223

Code and written solutions of the assignments of the Stanford CS224N: Natural Language Processing with Deep Learning course from winter 2022/2023
http://web.stanford.edu/class/cs224n/index.html
MIT License
209 stars 62 forks source link

Fix dim mismatch in Up/Down ProjectBlocks, a5 #3

Closed clarenceluo78 closed 1 year ago

clarenceluo78 commented 1 year ago

Hey there,

Big thanks for sharing the repo! I stumbled upon a little hiccup in the DownProjectBlock and UpProjectBlock in a5 and found a dimension mismatch in the residual connections (e.g. the original sequence length of x_input was 128, which does not match the bottleneck_dim=64 after down projection). I simply removed these connections after CrossAttention and solved the problem.

The original Perceiver paper only includes CrossAttention in the first and last layers, so I figured aligning with the original paper might clear things up. I left the layer norm and mlp as-is since they don't seem to mess with the performance.

I hope my little tweak helps. Feel free to reach out if you need more details.

Cheers!

floriankark commented 1 year ago

Hello,

awesome that you like it! Yes, this was a problem. Great that you found it. Thank you very much!