This PR adds CrossAttentioBlock. This would be a better fit to use with session-based models then broadcasting to sequence, since that leads to a bunch of redundant computation. Cross-attention is also a core building-block for multi-model models like Flamingo:
Goals :soccer:
This PR adds
CrossAttentioBlock
. This would be a better fit to use with session-based models then broadcasting to sequence, since that leads to a bunch of redundant computation. Cross-attention is also a core building-block for multi-model models like Flamingo: