dibschat / tempAgg

[ECCV 2020] Temporal Aggregate Representations for Long-Range Video Understanding
MIT License
11 stars 5 forks source link

epic-kitchens_55 error #1

Closed richwardle closed 2 years ago

richwardle commented 2 years ago

Hi,

I'm having this errorError running Epic-Kitchens-55. Have you encountered this before? Thanks

Save file name anti_mod_rgb_span_6_s1_5_s2_3_s3_2_recent_2_r1_1.6_r2_1.2_r3_0.8_r4_0.4_bs_10_drop_0.3_lr_0.0001_dimLa_512_dimLi_512_epoc_15_vb_nn Printing Arguments Namespace(add_noun_loss=True, add_verb_loss=True, alpha=1, batch_size=10, best_model='best', debug_on=False, display_every=10, dropout_rate=0.3, ek100=False, epochs=15, imgtmpl='frame{:010d}.jpg', json_directory='tempAgg_ant_rec//models_anticipation/', latent_dim=512, linear_dim=512, lr=0.0001, modality='rgb', mode='train', noun_class=352, noun_loss_weight=1.0, num_class=2513, num_workers=0, past_attention=True, path_to_data='/content/drive/MyDrive/Individual_Project/Models/RULSTM/rulstm-master/RULSTM/data/ek55', path_to_models='models_anticipation/ek55', recent_dim=2, recent_sec1=1.6, recent_sec2=1.2, recent_sec3=0.8, recent_sec4=0.4, resume=False, scale=True, scale_factor=-0.5, schedule_epoch=10, schedule_on=1, span_dim1=5, span_dim2=3, span_dim3=2, spanning_sec=6, task='action_anticipation', topK=1, trainval=False, verb_class=125, verb_loss_weight=1.0, verb_noun_scores=True, video_feat_dim=1024, weight_flow=0.1, weight_obj=0.25, weight_rgb=0.4, weight_roi=0.25) Populating Dataset: 100% 23493/23493 [00:33<00:00, 694.22it/s] Populating Dataset: 100% 4979/4979 [00:07<00:00, 689.38it/s] Add verb losses Add noun losses /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) /pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed. /pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed. Traceback (most recent call last): File "main_anticipation.py", line 674, in main() File "main_anticipation.py", line 531, in main start_epoch, start_best_perf, schedule_on) File "main_anticipation.py", line 400, in train_validation loss.backward() File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 156, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: CUDA error: device-side assert triggered

dibschat commented 2 years ago

Hi,

This error occurs when the labels are outside the range of the outputs of the model. I ran the code again and didn't encounter such a problem. Could you check if you are mistakenly using EPIC-100 annotations? You should download the EPIC-55 (only) annotations from here: https://github.com/fpv-iplab/rulstm/tree/master/RULSTM/data/ek55

You can print the max label number for verb/noun/action and compare them with the classifier head output sizes to see where the mismatch is.