Issues in reproducing the results with our own training run

bpiyush commented 3 years ago

Hi authors,

Thanks for making this code public.

So I ran the following two evaluations:

Running evaluation on val set directly using your shared pre-trained model. For this, I get the exact same results as reported in the paper! (🥇 )
```
{"noun_top1_acc": "50.02", "noun_top5_acc": "75.62", "split": "test_final", "verb_top1_acc": "65.56", "verb_top5_acc": "90.00"}
```
Next, I trained the SlowFast model for 30 epochs using instructions here. And then evaluated the 30th checkpoint on the validation set. The results I get are vastly different than in the previous case. Now, I am wondering what could be potential causes for this discrepancy. As far as I can tell, only the following things have changed which might be causing this.
```
{"noun_top1_acc": "11.00", "noun_top5_acc": "30.93", "split": "test_final", "verb_top1_acc": "32.93", "verb_top5_acc": "74.95"}
```
a. I am using batch_size: 8 due to GPU memory constraint. b. PyTorch version (?): I am using 1.9.1+cu102; c. I had to make two code changes to make it work

1] This may be due to the torch version.

index decf12b..2ece3dc 100644
--- a/slowfast/utils/metrics.py
+++ b/slowfast/utils/metrics.py
@@ -38,7 +38,7 @@ def topks_correct(preds, labels, ks):
     top_max_k_correct = top_max_k_inds.eq(rep_max_k_labels)
     # Compute the number of topk correct predictions for each k.
     topks_correct = [
-        top_max_k_correct[:k, :].view(-1).float().sum() for k in ks
+        top_max_k_correct[:k, :].reshape(-1).float().sum() for k in ks
     ]
     return topks_correct

2] This was because there were strings in the metadata and the code was trying to do .cuda() on them

--- a/tools/train_net.py
+++ b/tools/train_net.py
@@ -58,9 +58,11 @@ def train_epoch(train_loader, model, optimizer, train_meter, cur_epoch, cfg):
         for key, val in meta.items():
             if isinstance(val, (list,)):
                 for i in range(len(val)):
-                    val[i] = val[i].cuda(non_blocking=True)
+                    if not isinstance(val[i], str):
+                        val[i] = val[i].cuda(non_blocking=True)
             else:
-                meta[key] = val.cuda(non_blocking=True)
+                if not isinstance(val, str):
+                    meta[key] = val.cuda(non_blocking=True)

Could you kindly provide some inputs which could help me figure out what I may be missing here? For instance, would batch size matter this much? Or what precise torch version have you used?

Thanks in advance!

ekazakos commented 3 years ago

Hi @bpiyush ,

Thanks for you using our code and also thank you for spotting this discrepancy in the results! It turns out that I had forgotten to include instructions about loading the pretrained model, where we use the Kinetics-400 pretrained model provided by the authors of SlowFast. I updated the readme with a link of the K400 pretrained model as well as how to use it in training where essentially you just have to add TRAIN.CHECKPOINT_FILE_PATH /path/to/SLOWFAST_8x8_R50.pkl in the training command.

Also since you are using a smaller batch size, you should expect a difference in results. A common rule you could use is the linear scaling rule where you change your learning rate linearly with with the batch size. So in your case, you can try smaller values for your learning rate than the ones we use.

I hope this helps and let me know if you are having any other issues.

bpiyush commented 3 years ago

Hi @ekazakos ,

Thanks for the fairly quick response :) I have re-run training with the given checkpoint and smaller learning rate. Will post an update on this thread when I have results!

bpiyush commented 3 years ago

On running the training again with K400 pretrained model, I get the following results:

{"noun_top1_acc": "50.66", "noun_top5_acc": "75.74", "split": "test_final", "verb_top1_acc": "65.95", "verb_top5_acc": "90.15"}

This is not exact but almost the same (the difference is probably due to different batch size and learning rate). So, thanks a lot. I think I can close the issue for now :)

bpiyush commented 3 years ago

Hey @ekazakos

So I am facing another subsequent issue while testing on the test set. I am using the instructions in the README to evaluate a checkpoint (the given file SlowFast.pyth) on the test set of EPIC. No training involved, I simply want to take the given checkpoint and evaluate on the test set. The command is as follows:

python tools/run_net.py \
    --cfg $cfg \
    NUM_GPUS $num_gpus \
    OUTPUT_DIR $output_dir \
    EPICKITCHENS.VISUAL_DATA_DIR $dataset_dir \
    EPICKITCHENS.ANNOTATIONS_DIR $annotations_dir \
    TRAIN.ENABLE $train \
    TEST.ENABLE $test \
    TEST.CHECKPOINT_FILE_PATH $ckpt_path \
    EPICKITCHENS.TEST_LIST $test_filename \
    EPICKITCHENS.TEST_SPLIT $test_split

The exact same checkpoint does work on the val set as expected but not on the test. On the test set, I basically get 0 for all the metrics. Could you please check if I am missing something obvious here?

Thanks in advance.

ekazakos commented 3 years ago

Hi,

That is right, you are not missing anything. We are providing the labels for the validation set for development, while we keep private the test set labels as we run an action recognition competition. To obtain performance on the test set, you have to submit the scores of your method in the Action Recognition challenge in Codalab in this link. When running the evaluation script in the test set, the scores are stored in test.pkl in the location specified by OUTPUT_DIR, but accuracy cannot be computed since labels are not provided, hence the zeros that you are getting. To create the necessary files to submit to the competition, please follow the instruction in this link.

bpiyush commented 3 years ago

Ah, I see. I was worried that I was doing something wrong and completely overlooked the aspect of the challenge you are running. Thanks for the clarification. Appreciate it!

epic-kitchens / epic-kitchens-slowfast

Issues in reproducing the results with our own training run #7