Closed ghost closed 7 years ago
The paper does not seem to mention different RNN's trained for different sentences. We force the classifier in the end to learn a shared representation among sequence of sentences and sequence of words. This shared representation helps the model to generalise well to unseen data.
There is nothing stopping you from training different GRU's though, but I think it will be a futile effort.
According to my understanding, the LSTM trained on different sentence should be different, but according to your model, each sentence has the same LSTM parameters for wordAttnRNN?