Video classification distributed training takes too long time

OBVIOUSDAWN commented 2 years ago

HI， Thank you for your excellent work. I conducted the experiments on python=3.8, torch=1.9.0, and torchvision=0.10.0. The dataset is a self dataset, labeled with reference to the kinetics format, and the dataset was trained properly on networks of mmaction. Initially, I modified run.sh to make the code work on 1080ti*1 with the following parameters work_path=$(dirname $0) PYTHONPATH=$PYTHONPATH:./slowfast \ python tools/run_net.py \ --cfg $work_path/config.yaml \ DATA.PATH_TO_DATA_DIR ……/mydataset \ DATA.PATH_PREFIX mydataset \ DATA.PATH_LABEL_SEPARATOR "," \ TRAIN.EVAL_PERIOD 5 \ TRAIN.CHECKPOINT_PERIOD 1 \ TRAIN.BATCH_SIZE 6 \ NUM_GPUS 1 \ UNIFORMER.DROP_DEPTH_RATE 0.1 \ SOLVER.MAX_EPOCH 100 \ SOLVER.BASE_LR 2.5e-5 \ SOLVER.WARMUP_EPOCHS 10.0 \ DATA.TEST_CROP_SIZE 224 \ TEST.NUM_ENSEMBLE_VIEWS 1 \ TEST.NUM_SPATIAL_CROPS 1 \ RNG_SEED 6666 \ OUTPUT_DIR $work_path

The results show that the training takes about 5d to complete. When I tried to get the code to train on 1080ti*4, I modified run.sh as follows work_path=$(dirname $0) PYTHONPATH=$PYTHONPATH:./slowfast \ python tools/run_net.py \ --cfg $work_path/config.yaml \ DATA.PATH_TO_DATA_DIR ……/mydataset \ DATA.PATH_PREFIX mydataset \ DATA.PATH_LABEL_SEPARATOR "," \ TRAIN.EVAL_PERIOD 5 \ TRAIN.CHECKPOINT_PERIOD 1 \ TRAIN.BATCH_SIZE 24 \ NUM_GPUS 4 \ UNIFORMER.DROP_DEPTH_RATE 0.1 \ SOLVER.MAX_EPOCH 100 \ SOLVER.BASE_LR 1e-4 \ SOLVER.WARMUP_EPOCHS 10.0 \ DATA.TEST_CROP_SIZE 224 \ TEST.NUM_ENSEMBLE_VIEWS 1 \ TEST.NUM_SPATIAL_CROPS 1 \ RNG_SEED 6666 \ OUTPUT_DIR $work_path

The expected training time became 18d or even longer and showed no signs of decreasing during the training.

Both my attempts were done on the same machine and I have no idea why multicard training would cause a significant increase in training time. In terms of adjusting the learning rate I tried to adjust according to the batchsize as mentioned by mmaction, is the current adjustment correct, if the learning rate is adjusted proportionally it will be in a very small situation. I would appreciate if you could tell me what is causing the significant increase in training time and what I need to do to modify it. I look forward to hearing from you.

Andy1621 commented 2 years ago

Thanks for your good try! Unfortunately, I never meet the similar bug. Since the code is forked from PySlowFast and I never change the launch code, I recommend you to have a try on the original repo.

By the way, I remember that the default code does not suppot training on single GPU (reports some bug). Have you change the lauch code for multiprocessing? https://github.com/Sense-X/UniFormer/blob/1ce70bcccbe72962813aedf5eb1209f318f859b6/video_classification/slowfast/utils/misc.py#L283-L311

OBVIOUSDAWN commented 2 years ago

Thanks for your good try! Unfortunately, I never meet the similar bug. Since the code is forked from PySlowFast and I never change the launch code, I recommend you to have a try on the original repo.

By the way, I remember that the default code does not suppot training on single GPU (reports some bug). Have you change the lauch code for multiprocessing?

https://github.com/Sense-X/UniFormer/blob/1ce70bcccbe72962813aedf5eb1209f318f859b6/video_classification/slowfast/utils/misc.py#L283-L311

Thank you for your help. It looks like the problem is coming from the hardware side, this server I sharing with someone else and it may be caused by his training takes up a lot of cpu. So far everything seems to be working fine and the total training time is down to 1day.

Can I rebuild the uniformer according to the slowfast in mmaction2, and will this have an impact on accuracy?

Also, in terms of learning rate should I batch adjust because of the reduction of total batchsize. I have trained DETR for transformer structure before and found that when the training bachsize decreases, if the lr is reduced proportionally, it will lead to serious performance degradation, should I follow the preset learning rate for training?

Finally, I found that the training process will be performed with 100epoch, but the paper mentions 50epoch, and the timesformer using vit for video classification only needs to be trained with 15epoch, I would like to know if I need to train so many epochs to converge properly, or if only 50epoch is sufficient. Thank you for your help. I am looking forward to your reply.

Andy1621 commented 2 years ago

Reproduce in mmaction2: In my opinion, it seems that you will run UniFormer on your own dataset, thus you can copy the model code to mmaction2. If you want to reproduce the results on Kinetics or SthSth, you may need to modify the config on mmaction2 and adopt strong data augmentation as in UniFormer (Repeated Augmentation, Random Augmentation, Mixup...).
Learning rate: For training, I suggest you adjust the learning rate based on the batch. You can double the learning rate if you double the batch size. I forget whether mmaction2 will adjust the learning rate automatically based on the batch size.
Epoch: For Kinetics, we train UniFormer for 100 epoch. For SthSth, we train it for 50 epoch. If you train it on your own dataset, you should adjust the learning rate and droppath, which may lead to overfitting. More importantly, I suggest you use the Kinetics pre-trained models for your own dataset.

OBVIOUSDAWN commented 2 years ago

Reproduce in mmaction2: In my opinion, it seems that you will run UniFormer on your own dataset, thus you can copy the model code to mmaction2. If you want to reproduce the results on Kinetics or SthSth, you may need to modify the config on mmaction2 and adopt strong data augmentation as in UniFormer (Repeated Augmentation, Random Augmentation, Mixup...).

Learning rate: For training, I suggest you adjust the learning rate based on the batch. You can double the learning rate if you double the batch size. I forget whether mmaction2 will adjust the learning rate automatically based on the batch size.

Epoch: For Kinetics, we train UniFormer for 100 epoch. For SthSth, we train it for 50 epoch. If you train it on your own dataset, you should adjust the learning rate and droppath, which may lead to overfitting. More importantly, I suggest you use the Kinetics pre-trained models for your own dataset.

Thank you for your reply. I am trying to copy the model code to mmaction2. The learning rate I have adjusted manually. Since my dataset is too small, 800 videos/10s in total for 5 categories, I will try to adjust the learning rate and epoch in my subsequent work. thanks again for your help.

Andy1621 commented 2 years ago

For such a small dataset, you may need to increase droppath_rate and add dropout before final FC for classification.

Sense-X / UniFormer

Video classification distributed training takes too long time #64