Are RNN , LSTM , Conv-lstm layer GRU layer implemented correctly ?

blackwool commented 3 years ago

Hi guys I read pass.c and network_kernels.cu and lstm-layer.c and rnn.c and conv-lstm.c ...

In foward network, the output data length passed to next layer is only one step output data length, not all step output length, and so only the first step output data is passed to next layer.

parser.c 295: int output = option_find_int(options, "output",1); 298: layer l = make_lstm_layer(params.batch, params.inputs, output, params.time_steps, batch_normalize); lstm-layer.c 91: l.outputs = outputs; parser.c 1616: params.inputs = l.outputs;

In backward network , only the first step delta value is calculated and passed to RNN ,LSTM gru,conv-lstm layers to calculate weight_updates.

But in RNN layer or LSTM layer backward opertaion , each step will calculate weight_updates and biases update. but except the first setp .no other steps can get correct delta value. So how does each step calculates correct weight_updates and biases value?

I know that weight and biases are shared in rnn , lstm conv-lstm gru layers. but,when weights update value and biases update values are diffent in each step . How to update the shared weights and biases ?

Are RNN , LSTM ,CONV_LSTM GRU layerS implemented currectly ?

AlexeyAB commented 3 years ago

net->time_steps is related to net->batch (mini-batch), not to outputs. It should be processed as mini_batch*time_steps and outputs instead of mini_batch and outputs*time_steps for non-RNN/LSRM/GRU layers.

https://github.com/AlexeyAB/darknet/blob/b25c2c6cbdef3a849fd1f17eddfb5aa1387d868d/src/parser.c#L1153-L1156

But currently there is another bug in conv-LSTM if you don't use bottleneck=1. I just don't have a time to fix it.

blackwool commented 3 years ago

Hi AlexeyAB Thank you for replay. This is s simple cfg files i have created with subvison=1 , time_steps=8

Do you mean  [1 connected ] input size 10816 is not right ? 
should it be 86528 (10816 * 8) ?

batch = 4, time_steps = 8, train = 1 layer filters size/strd(dil) input output 0 CONV_LSTM Layer: 26 x 26 x 512 image, 16 filters conv 16 3 x 3/ 1 26 x 26 x 512 -> 26 x 26 x 16 0.100 BF conv 16 3 x 3/ 1 26 x 26 x 512 -> 26 x 26 x 16 0.100 BF conv 16 3 x 3/ 1 26 x 26 x 512 -> 26 x 26 x 16 0.100 BF conv 16 3 x 3/ 1 26 x 26 x 512 -> 26 x 26 x 16 0.100 BF conv 16 3 x 3/ 1 26 x 26 x 16 -> 26 x 26 x 16 0.003 BF conv 16 3 x 3/ 1 26 x 26 x 16 -> 26 x 26 x 16 0.003 BF conv 16 3 x 3/ 1 26 x 26 x 16 -> 26 x 26 x 16 0.003 BF conv 16 3 x 3/ 1 26 x 26 x 16 -> 26 x 26 x 16 0.003 BF 1 connected 10816 -> 512 2 connected 512 -> 2 3 softmax 2 4 cost 2 Total BFLOPS 0.412 avg_outputs = 2266 Allocate additional workspace_size = 1.05 MB Learning Rate: 0.01, Momentum: 0.9, Decay: 0.1 train=19 test=2

laiou commented 3 years ago

in src/rnn_layer.c/make_rnn_layer: 43 l.state = (float)xcalloc(batch hidden (steps + 1), sizeof(float)); src/rnn_layer.c/forward_rnn_layer: 113 if(state.train) l.state += l.hiddenl.batch; src/rnn_layer.c/backward_rnn_layer: 146 l.state += l.hiddenl.batchl.steps; after forward l.state has been changed，why backward need mobile l.state in here？

blackwool commented 3 years ago

Because when backwards, each step hidden layer needs to caculate delta value according to its next step . so you will see below in the for loop. 155: l.state -= l.hidden*l.batch;

AlexeyAB commented 3 years ago

We change only local copy of float *l.state pointer only inside 1 function.

laiou commented 3 years ago

We change only local copy of float *l.state pointer only inside 1 function.

Thanks for you reply. I check src/network.c in 268 row “void forward_network(network net, network_state state)”，274 row "layer l = net.layers[i];" and 279 row "l.forward(l, state);", well ,forward get the a copy of the current layers. Thanks again for your reply. and there is another question is in src/lstm_layer.c. here in 29 row： in src/lstm_layer.c 29 layer make_lstm_layer(int batch, int inputs, int outputs, int steps, int batch_normalize) in make_lstm_layer l.delta is not get memory here,but in backward_lstm_layer l.delta has been used. in src/lstm_layer.c 290 l.delta += l.outputsl.batch(l.steps - 1); Whether l.delta should be allocated memory in make_lstm_layer?

laiou commented 3 years ago

Because when backwards, each step hidden layer needs to caculate delta value according to its next step . so you will see below in the for loop. 155: l.state -= l.hidden*l.batch;

Thanks for you reply，i mean is after forward. l.state has been changed to last steps start ,why in backward need to mobile l.state again, AlexeyAB is true， because forward and backward get a copy current layer . So the original pointer is not changed .

wangj688-lab commented 3 years ago

How can I use lstm to detect sequence images, not videos？

AlexeyAB / darknet

Are RNN , LSTM , Conv-lstm layer GRU layer implemented correctly ? #7107