HHTseng / video-classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
936 stars 216 forks source link

Issue in CRNN/functions.py #3

Closed MyJumperBroke23 closed 5 years ago

MyJumperBroke23 commented 5 years ago

` RNN_out, (h_n, h_c) = self.LSTM(x_RNN, None)
""" h_n shape (n_layers, batch, hidden_size), h_c shape (n_layers, batch, hidden_size) """ """ None represents zero initial hidden state. RNN_out has shape=(batch, time_step, output_size) """

    # FC layers
    x = self.fc1(RNN_out[:, -1, :])   # choose RNN_out at the last time step`

The pytorch docs say that the output of LSTM should be of shape (seq_len, batch, hidden_size)

Shouldn't it be x = self.fc1(RNN_out[-1, :, :]) if you want the output at the last timestep?

HHTseng commented 5 years ago

` RNN_out, (h_n, h_c) = self.LSTM(x_RNN, None) """ h_n shape (n_layers, batch, hidden_size), h_c shape (n_layers, batch, hidden_size) """ """ None represents zero initial hidden state. RNN_out has shape=(batch, time_step, output_size) """

    # FC layers
    x = self.fc1(RNN_out[:, -1, :])   # choose RNN_out at the last time step`

The pytorch docs say that the output of LSTM should be of shape (seq_len, batch, hidden_size) Shouldn't it be x = self.fc1(RNN_out[-1, :, :]) if you want the output at the last timestep?

Yes, the output has shape (seq_len, batch, hidden_size) when batch_first=False, but since I used batch_first=True in the LSTM layer the RNN_out has the shape as claimed. You may also see the dimension RNN_out[-1, :, :] is not desired. Please let me know if this answers your question, thanks!