chaitjo / structured-self-attention

Keras implementation of the Structured Self-Attentive Sentence Embedding model
https://arxiv.org/abs/1703.03130
MIT License
19 stars 1 forks source link

Attention does not support masking #1

Open LincLabUCCS opened 5 years ago

LincLabUCCS commented 5 years ago

Hello The parameter 'mask_zero = True ' however it raises an error that attention_layer does not support masking " Layer attention_layer1 does not support masking, but was passed an input_mask: Tensor("sequence_word_embeddings_3/NotEqual:0", shape=(?, 750), dtype=bool) "

is there a way to solve this?

Thank you for sharing the code

chaitjo commented 5 years ago

Sorry, I did not get around to adding testable code for this. Indeed, it is true that the Conv1D layer used for implementing the attention blocks does not support masking.

Here are some workarounds:

  1. Set mask_zero=False in the embedding layer, so zeros are not treated as special mask values. We're hoping the model still learns to treat zero as a special case none-the-less.

  2. Follow this discussion on stackoverflow and see if someone has an implementation of masked 1D convolutions: https://stackoverflow.com/questions/43392693/how-to-input-mask-value-to-convolution1d-layer

  3. If you want a more powerful attention model for NLP tasks, look into the Transformer. You can find very nice open source implementations.

LincLabUCCS commented 5 years ago

Thank you chaitjo