Fix dim mismatch in Up/Down ProjectBlocks, a5

Hey there,

Big thanks for sharing the repo! I stumbled upon a little hiccup in the DownProjectBlock and UpProjectBlock in a5 and found a dimension mismatch in the residual connections (e.g. the original sequence length of x_input was 128, which does not match the bottleneck_dim=64 after down projection). I simply removed these connections after CrossAttention and solved the problem.

The original Perceiver paper only includes CrossAttention in the first and last layers, so I figured aligning with the original paper might clear things up. I left the layer norm and mlp as-is since they don't seem to mess with the performance.

I hope my little tweak helps. Feel free to reach out if you need more details.

Cheers!

floriankark / cs224n-win2223

Fix dim mismatch in Up/Down ProjectBlocks, a5 #3