alexanderrichard / NeuralNetwork-Viterbi

MIT License
53 stars 19 forks source link

output of GRU #5

Closed wlin-at closed 5 years ago

wlin-at commented 5 years ago

Hi, in utils/network.py line 108, shouldn't it be output, dummy = self.gru(x) instead of dummy, output = self.gru(x) as the first output is output from GRU and the second output is the hidden state (doc https://pytorch.org/docs/0.4.1/nn.html#torch.nn.GRU). Regards

alexanderrichard commented 5 years ago

Ooops, how could that go unnoticed? Thanks for spotting! Corrected it.

wlin-at commented 5 years ago

Thanks for the quick reaction. However, simply changing the code in this way would result in an error due to unmatched output dim in line 132. from line 108:

output, dummy = self.gru(x)  #  output (batch_size 512, window_size 21, feature 64)
output = self.fc(output)  # (512, 21, 64)  ->  ( 512, 21, 48)
output = nn.functional.log_softmax(output, dim=2) # (512, 21, 48) -> (512, 21, 48)
# TODO: to aggregate the log softmax scores for this batch of subsequences, resulting in a matched size for line 132.

from line 132:

log_probs[offset : offset + output.shape[1], :] = output.data.cpu() # log_probs has size (n_frames, n_classes)

If I understand correctly, you sampled a batch (512) of chunks (with length of 21 frames) for training the RNN (as you didn't mention it in this paper but in "weakly supervised action learning with rnn based fine-to-coarse modeling") and here you wanted to aggregate the log-softmax scores of one chunk to represent the score for one frame. In this case, you end up with 512 scores for each batch (which represent 512 sampled frames) and accumulate the scores from batch to batch. I'll appreciate it if you could correct my misunderstanding. Regards.

wlin-at commented 5 years ago

In order to be compatible with line 132, I think the output of the RNN should be of shape (1, 512, 48) .

alexanderrichard commented 5 years ago

You are right. I was a bit fast with this change. Hopefully it works now.

The idea is to make training more efficient by feeding windowed subsequences through the network. We want to have the GRU output at the last frame of this 21 frames window. This is in line 109 now. (only line that changed)

I don't have access to the data at the moment, would appreciate if you can test the change and confirm that it works. Thanks a lot for spending so much attention to the code! I wrote it rather quickly since I used a more efficient C++ implementation for the paper... sorry for the inconvenience.

wlin-at commented 5 years ago

Thanks for the update. Now I fully understand your intention. As I mentioned above, the output of the RNN should be of shape (1, 512, 48) in order to be compatible in line 132. Only the swapping of dimensions is needed.

output = output[:, -1:, :] # (512, 21, 64) -> (512, 1, 64)
output = torch.transpose(output, 0, 1) # (512, 1, 64) -> (1, 512,  64)
wlin-at commented 5 years ago

Maybe also remove the comment "# tensor is of shape (batch_size, 1, features)" in line 111 to avoid misunderstanding.

wlin-at commented 5 years ago

And according to the DataWrapper at around line 72, for each frame t, you created a chunk over x[t-10 : t+11] (instead of x[t-20 : t] as mentioned in "weakly supervised action learning with rnn based fine-to-coarse modeling") and fed that to the RNN. In this case, does it still make sense to take the last frame of the 21-frame chunk as the GRU output?

alexanderrichard commented 5 years ago

Thanks for your comments. I have some time beginning of next week and might have a look into this to provide a fixed implementation. Regarding the chunks: t+11 provides a 10-frame look-ahead. This is beneficial from a performance point of view. If you need to build a strictly streaming based system (i.e. at time t no future frames are accessible), you might go with [t-20:t].

giulio93 commented 5 years ago

Hi all, thanks a lot to mantain and update the code. I was stuck to the same point of wlin93, and i solve it in the wrong way.. since now!

I still have a bug by the way, inserting the correction line at line 110 suggested by wlin93:

output = torch.transpose(output, 0, 1)

I have this error given by line 214 in "train" method, network.py file:

Traceback (most recent call last):
File "/home/1/2016/gpilotto/Desktop/originale/NeuralNetwork-Viterbi/utils/network.py", line 214, in train
    sequence_loss += loss.data[0] * input.shape[0] / len(data_wrapper)
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

i have resolve it,changing line 214 from this:

            sequence_loss += loss.data[0] * input.shape[0] / len(data_wrapper)

To this:

            sequence_loss += loss.data * input.shape[0] / len(data_wrapper)

Please, tell me if this make sense to you?

wlin-at commented 5 years ago

Hi all, thanks a lot to mantain and update the code. I was stuck to the same point of wlin93, and i solve it in the wrong way.. since now!

I still have a bug by the way, inserting the correction line at line 110 suggested by wlin93:

output = torch.transpose(output, 0, 1)

I have this error given by line 214 in "train" method, network.py file:

Traceback (most recent call last):
File "/home/1/2016/gpilotto/Desktop/originale/NeuralNetwork-Viterbi/utils/network.py", line 214, in train
    sequence_loss += loss.data[0] * input.shape[0] / len(data_wrapper)
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

i have resolve it,changing line 214 from this:

            sequence_loss += loss.data[0] * input.shape[0] / len(data_wrapper)

To this:

            sequence_loss += loss.data * input.shape[0] / len(data_wrapper)

Please, tell me if this make sense to you?

makes sense to me. What I did is sequence_loss += loss.item() * input.shape[0] / len(data_wrapper). Could you also tell me how you accuracy of the 4 splits on Breakfast is?

giulio93 commented 5 years ago

Could you also tell me how you accuracy of the 4 splits on Breakfast is?

I have downloaded the Breakfast data that is linked in this repo, and inside i find only spli1, i do not have split4. In split1, i have earned an accuracy of 0.425395 over 252 video files evaluated.

Did you try the 50Salad dataset linked here?

luuckiest commented 5 years ago

With the most recent updates, I receive the error by simply running the train.py Can anyone help?

ValueError: could not broadcast input array from shape (512,1,48) into shape (1,48)

giulio93 commented 5 years ago

ValueError: could not broadcast input array from shape (512,1,48) into shape (1,48) DId you update the code at line 110 as suggested by wlin93:

output = torch.transpose(output, 0, 1)

What version of Python are you using? Which dataset are you using?

Try to download and run code as it is over provided breakfast benchmark!

alexanderrichard commented 5 years ago

Hi Luuckiest,

yes there is an error that I still need to fix. Sorry for the delay, I was planning to have that done already but I am a bit behind my schedule :( The solution Giulio proposed will work.

luuckiest commented 5 years ago

ValueError: could not broadcast input array from shape (512,1,48) into shape (1,48) DId you update the code at line 110 as suggested by wlin93:

output = torch.transpose(output, 0, 1)

What version of Python are you using? Which dataset are you using?

Try to download and run code as it is over provided breakfast benchmark!

I am using python 3.7 and pytorch== 0.4.1.( I also tried on the most recent pytorch 1.2.0) I have tried to clone the whole branch again and still have the same problem.

Thanks for the help!

  1. Insert to file utils/network.py line 110 "output = torch.transpose(output, 0, 1) # (512, 1, 64) -> (1, 512, 64)" 2.Change line 214 from "sequence_loss += loss.data[0] input.shape[0] / len(data_wrapper)" to "sequence_loss += loss.data input.shape[0] / len(data_wrapper)"

These two are the changes I made from win93 and Giulio!

giulio93 commented 5 years ago

So, this is how your code looks like:

def forward(self, x): output, dummy = self.gru(x) output = output[:, -1:, :] output = torch.transpose(output, 0, 1) output = self.fc(output) output = nn.functional.log_softmax(output, dim=2) # tensor is of shape (batch_size, 1, features) return output

right? In line 214 please use sequence_loss += loss.item() * input.shape[0] / len(data_wrapper) as @wlin93 suggested!

If you still have problems, i will send you my code by tomorrow!

wlin-at commented 5 years ago

Could you also tell me how you accuracy of the 4 splits on Breakfast is?

I have downloaded the Breakfast data that is linked in this repo, and inside i find only spli1, i do not have split4. In split1, i have earned an accuracy of 0.425395 over 252 video files evaluated.

Did you try the 50Salad dataset linked here?

Hi thanks for the answer. I haven't tried the features of 50 Salads. However, I realized that my results vary quite much (from 34 to 42%) when setting different random seeds. Is it also your case?

giulio93 commented 5 years ago

Hi thanks for the answer. I haven't tried the features of 50 Salads. However, I realized that my results vary quite much (from 34 to 42%) when setting different random seeds. Is it also your case?

I've tried the code over the provided banchmark brakfast dataset, so i did not permute the training or test set. By the way, i'm aware of the variance that you are experience and i have a clue.

Breakfast dataset is well balance, but still length of the videos, different people performing action, different videos order presented in the decoding queue could lead into accuracy variance. Using more video and cross-validation could help to stabilize the accuracy.

alexanderrichard commented 5 years ago

Again thanks for spotting the errors. I finally pushed the fix so this issue does no longer require manual code changes.