Cannot replicate Kinetics-400 Results

ilkarman commented 4 years ago

Thank you very much for posting the codebase! However, I'm having difficulty replicating the 71.0% acc mentioned in the paper for 'bLVNet-TAM-8×2', I only get around 55% when validating each epoch. Training accuracy is 58%. If I do 10 crops per video for validation this rises to around 57%

Would it be possible to share your training log file? Or know your final training accuracy to see if something is wrong with my validation/eval script?

Wanted to check if any of the below was wrong:

TAM Backbone is initialised from 'ImageNet-bLResNet-50-a2-b4.pth.tar'

For bLVNet-model:

{'depth': 50, 'alpha': 2, 'beta': 4, 'groups': 16, 'num_classes': 400, 'dropout': 0.5, 'blending_frames': 3, 'input_channels': 3, 'pretrained': None, 'dataset': 'kinetics400', 'imagenet_blnet_pretrained': True}

For the training: 0.01 LR with a total batch of 64 (8 * 8 per GPU); cosine-annealing LR-schedule trained for 50 epochs

I thought perhaps this should be trained for 100 epochs; similar to TSM, not 50?

chunfuchen commented 4 years ago

As you pointed out in #4, the pretrained ImageNet model might not be loaded correctly, do you get this results after fixed the issue?

ilkarman commented 4 years ago

Thanks very much for your reply! Unfortunately this is still the case for me despite loading pretrained weights. I was wondering whether you could share your training/validation accuracy for a few epochs (not the multi-crop, multi-clip validation) to help debug?

ilkarman commented 4 years ago

I thought it could be more helpful to leave this log below (r50_a2_b4_f8x2). Initiating from pre-trained weights and then training 50 epochs with cosine-annealing from LR of 0.01 for a batch of 64 samples:

Epoch 0 Validation Acc: 0.09
Epoch 10 Validation Acc: 0.29
Epoch 20 Validation Acc: 0.35
Epoch 30 Validation Acc: 0.41
Epoch 40 Validation Acc: 0.47
Epoch 49 Validation Acc: 0.53

Using LR=0.001 I get:

Epoch 0 Validation Acc: 0.04
Epoch 10 Validation Acc: 0.36
Epoch 20 Validation Acc: 0.43
Epoch 30 Validation Acc: 0.46
Epoch 40 Validation Acc: 0.52
Epoch 49 Validation Acc: 0.54

Using LR=0.005 I get:

Epoch 0 Validation Acc: 0.11
Epoch 10 Validation Acc: 0.29
Epoch 20 Validation Acc: 0.35
Epoch 30 Validation Acc: 0.41
Epoch 40 Validation Acc: 0.47
Epoch 49 Validation Acc: 0.53

Training - GroupMultiScaleCrop(224), RandomHorizontalFlip(0.5) Validation - ResizeShortestSide(256), CentreCrop(224)

So validation is always around 0.53% and training-accuracy is around 52%. I guess to reach 71.0% acc with multi-clip, multi-crop the regular validation should be 65% + So I'm definitely behind a lot.

chunfuchen commented 4 years ago

Hi, sorry for the late reply.

We trained the Kinetics400 with 100 epochs instead of 50 epochs, sorry for the confusion.
You might want to check the number of videos you had in the training and validation set of the Kinetics400 dataset (https://github.com/facebookresearch/video-nonlocal-net/issues/67)

here is the log I retrain the model (by using the codes in this repo) with total batch size 72 under 6 GPUs setting:


Namespace(alpha=2, batch_size=72, beta=4, blending_frames=3, dataset='kinetics400', dense_sampling=False, depth=50, disable_scaleup=True, dropout=0.5, epochs=100, evaluate=False, frames_per_group=1, gpu=None, groups=16, imagenet_blnet_pretrained=True, input_channels=3, input_shape=224, logdir='./', lr=0.01, lr_scheduler='cosine', lr_steps=[15, 30, 45], modality='rgb', momentum=0.9, num_classes=400, num_clips=1, num_crops=1, pretrained=False, print_freq=500, random_sampling=False, resume=None, show_model=False, start_epoch=0, weight_decay=0.0005, workers=64)

100 epochs, single clip, and single crop acc

Val : [001/100] Loss: 3.7331 Top@1: 19.4875 Top@5: 45.4624 Speed: 967.55 ms/batch Val : [010/100] Loss: 2.3200 Top@1: 45.5692 Top@5: 73.5218 Speed: 1037.19 ms/batch Val : [020/100] Loss: 2.1634 Top@1: 49.6924 Top@5: 75.4639 Speed: 951.47 ms/batch Val : [030/100] Loss: 2.0265 Top@1: 51.9752 Top@5: 78.5500 Speed: 950.88 ms/batch Val : [040/100] Loss: 1.8245 Top@1: 55.8544 Top@5: 81.2802 Speed: 953.56 ms/batch Val : [050/100] Loss: 1.8113 Top@1: 57.8118 Top@5: 81.9360 Speed: 1041.54 ms/batch Val : [060/100] Loss: 1.7302 Top@1: 59.8759 Top@5: 82.9783 Speed: 1095.85 ms/batch Val : [070/100] Loss: 1.5482 Top@1: 63.2874 Top@5: 85.2509 Speed: 1160.07 ms/batch Val : [080/100] Loss: 1.4294 Top@1: 66.3582 Top@5: 87.0304 Speed: 993.56 ms/batch Val : [090/100] Loss: 1.3322 Top@1: 69.1952 Top@5: 88.1946 Speed: 893.33 ms/batch Val : [100/100] Loss: 1.3105 Top@1: 69.8968 Top@5: 88.4641 Speed: 968.87 ms/batch

The above model with 3 crops and 3 clip testing

Val@224(224) (# crops = 3, # clips = 3): Top@1: 71.2543 Top@5: 89.3386



I also retrained one with 50 epochs, it results in about 2% lower than the one trained with 100 epochs.

IBM / bLVNet-TAM

Cannot replicate Kinetics-400 Results #5