1edv / evolution

This repository contains the code for our manuscript - 'The evolution, evolvability, and engineering gene regulatory DNA'
MIT License
93 stars 27 forks source link

MultiHeadAttention output shape doesn't match #12

Closed vuhongai closed 1 year ago

vuhongai commented 1 year ago

Dear authors,

When I run your attention-based model on Colab, there is this "bug" that I can't explain why, I hope you can help. If the input_shape of MultiHeadAttention layer is (None,110,64) (example from Fig S12), the output shape is then (None, None, 64). Other than that, the model works fine because it can broadcast to its input_shape.

from aux import MultiHeadAttention
from tensorflow.keras import Input

input = Input(shape=(1024,110,64))
output = MultiHeadAttention(head_num=8)(input)
output.shape

TensorShape([Dimension(None), Dimension(None), Dimension(64)])

You can reproduce the result here in this Colab notebook.

And I wonder if the MultiHeadAttention layer implemented in your TPU model is the same as pre-built MultiHeadAttention in Tensorflow 2 (both are from "attention is all you need" paper). The MHA from TF2 give expected output shape by the way (None, 110, 64).

Thank you for your help. Ai

1edv commented 1 year ago

Dear Ai,

First, my apologies for the slow response (I missed the notification)!

Unfortunately, the MultiHeadAttention layer implemented in our TPU model isn't the same as the one in Tensorflow 2 (which can be found here) because I don't believe the 'official' TF2 was available at the time our model was originally trained.

The MultiHeadAttention implementation we used can be found here.

Since there is now an 'official' TF2 implementation available, if you are training a new model from scratch, I would definitely recommend using it directly.

Good luck!

Best, Eeshit