MultiHeadAttention output shape doesn't match

1edv / evolution

This repository contains the code for our manuscript - 'The evolution, evolvability, and engineering gene regulatory DNA'

MIT License

93 stars 27 forks source link

Dear authors,

When I run your attention-based model on Colab, there is this "bug" that I can't explain why, I hope you can help. If the input_shape of MultiHeadAttention layer is (None,110,64) (example from Fig S12), the output shape is then (None, None, 64). Other than that, the model works fine because it can broadcast to its input_shape.

from aux import MultiHeadAttention
from tensorflow.keras import Input

input = Input(shape=(1024,110,64))
output = MultiHeadAttention(head_num=8)(input)
output.shape

TensorShape([Dimension(None), Dimension(None), Dimension(64)])

You can reproduce the result here in this Colab notebook.

And I wonder if the MultiHeadAttention layer implemented in your TPU model is the same as pre-built MultiHeadAttention in Tensorflow 2 (both are from "attention is all you need" paper). The MHA from TF2 give expected output shape by the way (None, 110, 64).

Thank you for your help. Ai

1edv / evolution

MultiHeadAttention output shape doesn't match #12