I notice that the you refer to the dimension of "encoder_hidden" in your code DecoderRNN.py as (num_layers num_directions, batch_size, dim_hidden). However, if it's extracted by GRU, the dimension of hidden_state should be (num_layers num_directions, seq_len, dim_hidden). This confuses me when trying to run your codes.
It is obtained from this line of the Encoder. If you check the Pytorch GRU code, "hidden" actually represents the hidden state for the last time step not for all time steps.
I notice that the you refer to the dimension of "encoder_hidden" in your code DecoderRNN.py as (num_layers num_directions, batch_size, dim_hidden). However, if it's extracted by GRU, the dimension of hidden_state should be (num_layers num_directions, seq_len, dim_hidden). This confuses me when trying to run your codes.