eric-xw / Video-guided-Machine-Translation

Starter code for the VMT task and challenge
Evaluation Error : RuntimeError: rnn: hx is not contiguous #4

Open LinuxBeginner opened 4 years ago

LinuxBeginner commented 4 years ago

Training was successful. Data: vatex_training_v1.0.json vatex_validation_v1.0.json vatex_public_test_english_v1.1.json

System: Google Colab GPU

When I tried to run the python , it is showing the following error

Vocab size src/tgt:10523/2907 train/val/test size: 254/30/59 **** Start eval... **** Use epoch 34 as the best model for testing Traceback (most recent call last): File "", line 123, in main(args) File "", line 63, in main eval(test_loader, encoder, decoder, cp_file, tok_tgt, result_path) File "", line 90, in eval preds, pred_lengths = decoder.beam_decoding(srccap, init_hidden, src_out, vid_out, args.MAX_INPUT_LENGTH, beam_size=5) File "/content/drive/My Drive/MMT/MMTvatex/Video-guided-Machine-Translation/", line 208, in beam_decoding output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) File "/content/drive/My Drive/MMT/MMTvatex/Video-guided-Machine-Translation/", line 110, in onestep output, hidden = self.decoder(rnn_input, last_hidden) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 550, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/", line 570, in forward self.dropout,, self.bidirectional, self.batch_first) RuntimeError: rnn: hx is not contiguous

Could you please tell me why is this happening? Thank you.

eric-xw commented 4 years ago

Hi, can you try calling contiguous() for the inputs before feeding them into the decoder LSTM? The code is working on our end, so we cannot debug it.

LinuxBeginner commented 4 years ago

Hi eric, contiguous() is already implemented at line 169-173 in

 src_out_i = src_out[i].unsqueeze(0).expand(beam_size, src_out.size(1), src_out.size(2)).contiguous() # (bs, seq_len, N)
 vid_out_i = vid_out[i].unsqueeze(0).expand(beam_size, vid_out.size(1), vid_out.size(2)).contiguous()
src_mask_i = src_mask[i].unsqueeze(0).expand(beam_size, src_mask.size(1)).contiguous()
hidden_i = [_[:, i, :].unsqueeze(1).expand(_.size(0), beam_size, _.size(2)).contiguous() for _ in
                            hidden] # (n_layers, bs, 1024)

But, it is still not working, there was no issue at the time of training. The issue is showing only on running the Please advice.

eric-xw commented 4 years ago

Reading the error log, the issue is when calling the LSTM in Line 110. So try calling contiguous() for rnn_input, last_hidden.

bozhenhhu commented 4 years ago

before this line output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i), I add .contiguous() after output and hidden_i as follows: output = torch.from_numpy(outputs).cuda().contiguous() def from_numpy(self, states): return [torch.from_numpy(state).cuda().contiguous() for state in states] it works. Apart from this, I find the code in beam_decoding is very hard for me to figure out. It is hugly different with the code in inference, which I thought they may be similar before.
The second output, hidden_i, attn_weights = self.onestep(output, hidden_i, src_out_i, vid_out_i, src_mask_i) may can be deleted.

hynbjn commented 1 year ago

bozhenhhu commented 1 year ago

Do you have the same environment as this repository, like the prerequisites? It has been a long time since this model was published, and many packages have been updated, which may result in incompatibility. Why not try the up-to-date methods?