Possible bug in DecoderRNN class

HHTseng / video-classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101

916 stars 216 forks source link

Possible bug in DecoderRNN class #42

Closed sohaib023 closed 3 years ago

sohaib023 commented 3 years ago

Hello @HHTseng,

In DecoderRNN both input and output of the self.LSTM layer has dim=0 as batch dimension and dim=1 as timestep dimension. However, the pytorch documentation (link provided below) has those two dimensions reversed both for the input and output. Is there any specific reason for this difference in implementation? If not, this is a significant bug that needs to be fixed as soon as possible. (I'm willing to fix it and create a pull request if you give the go ahead.)

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM

sprout195 commented 3 years ago

i found the same question and failed to exploit the temperal characteristic of video when i use ResnetCRNN, i fix the code as follow and i dont know whether is right: RNN_out, (h_n, h_c) = self.LSTM(x_RNN.view(self.h_RNN,x_RNN.size(0),self.RNN_input_size), None)

sohaib023 commented 3 years ago

@sprout195

This will not work. For example I have following array.

X = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]] X.view(5, 2) will yield [[1, 2], [3, 4], [5, 6], [7, 8], [9,10]]

X.transpose(0,1) will yield [[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]

Both yield the same shape but first one is erroneous. What you need is the latter.

sohaib023 commented 3 years ago

For example, this code should work:

x_RNN = x_RNN.transpose(0, 1) RNN_out, (h_n, c_n) = self.LSTM(x_RNN, None) RNN_out = RNN_out.transpose(0, 1)

sprout195 commented 3 years ago

i try the later code（transpose） and it seem that the architecture lose its function. maybe there is someelse solution

sohaib023 commented 3 years ago

@sprout195 Hey, Could you please explain what exactly you mean by "lose its function"? Is it giving an error or is it not learning?

sprout195 commented 3 years ago

sorry. lose it function means that it dosent work in classification or output error

sprout195 commented 3 years ago

we may make mistake. when the parameter batch_first of lstm is true. it will has batch size as first dimension. (batch.timestep .input size)

sohaib023 commented 3 years ago

@sprout195

Hey, you're right. The original code was correct. I had overlooked the "batch_first" parameter. Good find there :D I had given up and was looking to try some other approach like 3D CNNs instead. However, I must say, it's not a very advisable feature to allow this on pytorch's end as altering input/output behavior will lead to confusion (as it did). Least they could do is mention a note in the input output descriptions about the "batch_first" flag.

Anyway. I'll close this issue as it has been resolved.

sprout195 commented 3 years ago

this confusion really cost me much time