agethen / ConvLSTM-for-Caffe

28 stars 15 forks source link

How to take 2 timesteps as input #9

Open Wei-TianHao opened 7 years ago

Wei-TianHao commented 7 years ago

Hi, agethen. Thanks for your work! I am trying to re-implement Polygon-RNN, a net takes two frames and two states before as input. The RNN part looks like below.

c9562f76-4e76-2eb3-450f-50f81e945fbc

I have totally no idea how to write the prototxt. Could you please give some advice? Any suggestion would be appreciate. Thanks a lot!

agethen commented 7 years ago

Hi, it seems I totally missed this issue. Sorry about that.

I have not read the paper, and only checked the figure showing the architecture, but it seems to me that Polygon-RNN is using two ConvLSTM layers, one for each.

If you really would want to take two timesteps at once as input, you would probably reshape/transpose or concatenate the data in some way.

ctensmeyer commented 7 years ago

I'm also interested in reproducing Polygon RNN.

@agethen, the main difficulty here is that the output of the previous two timesteps are being fed in as input to the current time step.

agethen commented 7 years ago

I see, thank you for the explanation.

I am sorry to say that that should not be possible, at least in the current implementation. Internally, I create a new network in order to unroll the ConvLSTM over time (just as with caffe's default LSTM), and the input needs to be known in advance. I don't plan to work on that at the moment -- to be honest, it might be easier to solve in another framework, like for example Tensorflow.

I suppose you could just manually spawn convolutional layers with shared weights + elementwise layers for each timestep, but it is not flexible, and rather...convoluted ;)

ctensmeyer commented 7 years ago

Thanks for confirming my suspicion. I think I'll just write some python code to manually unroll the LSTM in the prototxt and make a new layer to handle padding the output.