Experimental details of Table 1?

jiujing23333 commented 4 years ago

Hi, Tengda. I'm trying to reimplement your promising result using the small dataset UCF101. Can you provide the hyper-parameter setting of the Table 1 in your paper? Such as input size, training epochs, etc. Very thanks.

TengdaHan commented 4 years ago

Hi, Note at the beginning: this experiment is just for proof of concept, it's not the right way for self-supervised learning (pretrain & finetune on the same dataset). IMO, the right way is always to utilize more unlabelled data to show its quality.

We tried slightly different settings for multiple times and get similar performance. Here is just one example to get Table1 DPC result.

For DPC training on UCF101 (dpc/):

python main.py --gpu {your gpu id} --net resnet18 --model dpc-rnn --dataset ucf101 --seq_len 5 --num_seq 8 --pred_step 3 --ds 3 --batch_size {32 per GPU} --img_dim 128.
Note that batch size is 32 per GPU, changing the number of GPUs should not affect performance much because the effective batch size for contrastive loss is capped by single GPU batch size.
Learning rate schedule: Adam 1e-3 for 400 epochs and then Adam 1e-4 for 70 epochs.
Here is DPC loss and top1 acc curve (blue=train, orange=val):
- From the curve, you can see probably you can save your time for training. e.g. you could reduce lr earlier because the validation curve from ep250 to ep400 doesn't change much.

For UCF101 finetuning (eval/):

python test.py --gpu {your gpu id} --net resnet18 --model lc --dataset ucf101 --seq_len 5 --num_seq 5 --ds 3 --batch_size {32 per GPU} --img_dim 128 --pretrain {the pretrained model} --train_what ft
Learning rate schedule: Adam 1e-3 for 60 eps, 1e-4 for 40 eps, repeat 3 times (other schedules also work, like decay lr twice, you can try. Also SGD should always give better results than Adam, although we didn't try). The purpose of this step is to evaluate the feature quality. So you choose one schedule, and fix it to get the comparison.

Good luck & have fun!

jiujing23333 commented 4 years ago

@TengdaHan Thanks for your kind reply. I just want to do a quick, simple experiment because the Kinetics 400 dataset is so time-consuming (as long as 1 week+ and 6 weeks). Do you think what is the bottleneck of the training time?

TengdaHan commented 4 years ago

Kinetics400 with 128x128 resolution, 3D-ResNet18, can get a good feature with less than 1 week (maybe 2-3 days). The bottleneck of the training time is always GPU runtime (GPU utility is 100% in my training), as long as you read video frames from SSD.

n-behrmann commented 4 years ago

Hi Tengda, thanks for sharing your code and thanks for the detailed reply to this issue, I found it very helpful! However, there is one thing I'm wondering about. I could reproduce 60.2% of the DPC method using the hyperparameters mentioned here, but when I ran the same finetuning code (same hyperparameters) with random initialization I got 54.4% instead of 46.5% mentioned in the paper. Is there something I might be missing? Thanks in advance!

TengdaHan commented 4 years ago

Hi, thanks for mention this. Your experimental setting and results are correct. By the time of submission, we didn't train the random initialization baseline long enough, partially because the 46.5% we obtained already matches previous random initialization baselines (e.g. Hara et al. 2018). Now we have realized this issue, and we will release a public benchmark very soon. Stay tuned :)

n-behrmann commented 4 years ago

Hi Tengda, thank you for the quick reply! That's very helpful :)

TengdaHan / DPC

Experimental details of Table 1? #11