Big thanks for sharing the repo! I stumbled upon a little hiccup in the DownProjectBlock and UpProjectBlock in a5 and found a dimension mismatch in the residual connections (e.g. the original sequence length of x_input was 128, which does not match the bottleneck_dim=64 after down projection). I simply removed these connections after CrossAttention and solved the problem.
The original Perceiver paper only includes CrossAttention in the first and last layers, so I figured aligning with the original paper might clear things up. I left the layer norm and mlp as-is since they don't seem to mess with the performance.
I hope my little tweak helps. Feel free to reach out if you need more details.
Hey there,
Big thanks for sharing the repo! I stumbled upon a little hiccup in the
DownProjectBlock
andUpProjectBlock
in a5 and found a dimension mismatch in the residual connections (e.g. the original sequence length of x_input was 128, which does not match thebottleneck_dim=64
after down projection). I simply removed these connections after CrossAttention and solved the problem.The original Perceiver paper only includes CrossAttention in the first and last layers, so I figured aligning with the original paper might clear things up. I left the layer norm and mlp as-is since they don't seem to mess with the performance.
I hope my little tweak helps. Feel free to reach out if you need more details.
Cheers!