Closed sohaib023 closed 3 years ago
i found the same question and failed to exploit the temperal characteristic of video when i use ResnetCRNN, i fix the code as follow and i dont know whether is right: RNN_out, (h_n, h_c) = self.LSTM(x_RNN.view(self.h_RNN,x_RNN.size(0),self.RNN_input_size), None)
@sprout195
This will not work. For example I have following array.
X = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]] X.view(5, 2) will yield [[1, 2], [3, 4], [5, 6], [7, 8], [9,10]]
X.transpose(0,1) will yield [[1, 6], [2, 7], [3, 8], [4, 9], [5, 10]]
Both yield the same shape but first one is erroneous. What you need is the latter.
For example, this code should work:
x_RNN = x_RNN.transpose(0, 1) RNN_out, (h_n, c_n) = self.LSTM(x_RNN, None) RNN_out = RNN_out.transpose(0, 1)
i try the later code(transpose) and it seem that the architecture lose its function. maybe there is someelse solution
@sprout195 Hey, Could you please explain what exactly you mean by "lose its function"? Is it giving an error or is it not learning?
sorry. lose it function means that it dosent work in classification or output error
we may make mistake. when the parameter batch_first of lstm is true. it will has batch size as first dimension. (batch.timestep .input size)
@sprout195
Hey, you're right. The original code was correct. I had overlooked the "batch_first" parameter. Good find there :D I had given up and was looking to try some other approach like 3D CNNs instead. However, I must say, it's not a very advisable feature to allow this on pytorch's end as altering input/output behavior will lead to confusion (as it did). Least they could do is mention a note in the input output descriptions about the "batch_first" flag.
Anyway. I'll close this issue as it has been resolved.
this confusion really cost me much time
Hello @HHTseng,
In DecoderRNN both input and output of the self.LSTM layer has dim=0 as batch dimension and dim=1 as timestep dimension. However, the pytorch documentation (link provided below) has those two dimensions reversed both for the input and output. Is there any specific reason for this difference in implementation? If not, this is a significant bug that needs to be fixed as soon as possible. (I'm willing to fix it and create a pull request if you give the go ahead.)
https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM