Open AbrarKhan009 opened 3 months ago
My training config is given below,
TRAIN: ENABLE: True DATASET: mydata BATCH_SIZE: 1 # bcz of limited resources i am using this batch size EVAL_PERIOD: 5 CHECKPOINT_PERIOD: 5 AUTO_RESUME: True CHECKPOINT_EPOCH_RESET: True CHECKPOINT_FILE_PATH: "/home/mukhan/project/slowfast/MViTv2_B_32x3_k400_f304025456.pyth" CHECKPOINT_IN_INIT: True
DATA:
USE_OFFSET_SAMPLING: True
DECODING_BACKEND: torchvision
NUM_FRAMES: 32
SAMPLING_RATE: 1
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 224
INPUT_CHANNEL_NUM: [3]
PATH_TO_DATA_DIR: "/home/mukhan/project/slowfast/data/Mydata/" # csv files locations for train and val
TRAIN_JITTER_SCALES_RELATIVE: [0.08, 1.0]
TRAIN_JITTER_ASPECT_RELATIVE: [0.75, 1.3333]
MVIT: ZERO_DECAY_POS_CLS: False USE_ABS_POS: False REL_POS_SPATIAL: True REL_POS_TEMPORAL: True DEPTH: 24 NUM_HEADS: 1 EMBED_DIM: 96 PATCH_KERNEL: (3, 7, 7) PATCH_STRIDE: (2, 4, 4) PATCH_PADDING: (1, 3, 3) MLP_RATIO: 4.0 QKV_BIAS: True DROPPATH_RATE: 0.3 NORM: "layernorm" MODE: "conv" CLS_EMBED_ON: True DIM_MUL: [[2, 2.0], [5, 2.0], [21, 2.0]] HEAD_MUL: [[2, 2.0], [5, 2.0], [21, 2.0]] POOL_KVQ_KERNEL: [3, 3, 3] POOL_KV_STRIDE_ADAPTIVE: [1, 8, 8] POOL_Q_STRIDE: [ [0, 1, 1, 1], [1, 1, 1, 1], [2, 1, 2, 2], [3, 1, 1, 1], [4, 1, 1, 1], [5, 1, 2, 2], [6, 1, 1, 1], [7, 1, 1, 1], [8, 1, 1, 1], [9, 1, 1, 1], [10, 1, 1, 1], [11, 1, 1, 1], [12, 1, 1, 1], [13, 1, 1, 1], [14, 1, 1, 1], [15, 1, 1, 1], [16, 1, 1, 1], [17, 1, 1, 1], [18, 1, 1, 1], [19, 1, 1, 1], [20, 1, 1, 1], [21, 1, 2, 2], [22, 1, 1, 1], [23, 1, 1, 1], ] DROPOUT_RATE: 0.0 DIM_MUL_IN_ATT: True RESIDUAL_POOLING: True
AUG: NUM_SAMPLE: 2 ENABLE: True COLOR_JITTER: 0.4 AA_TYPE: rand-m7-n4-mstd0.5-inc1 INTERPOLATION: bicubic RE_PROB: 0.25 RE_MODE: pixel RE_COUNT: 1 RE_SPLIT: False
MIXUP: ENABLE: True ALPHA: 0.8 CUTMIX_ALPHA: 1.0 PROB: 1.0 SWITCH_PROB: 0.5 LABEL_SMOOTH_VALUE: 0.1
SOLVER: ZERO_WD_1D_PARAM: True BASE_LR_SCALE_NUM_SHARDS: True CLIP_GRAD_L2NORM: 1.0 BASE_LR: 0.00001 COSINE_AFTER_WARMUP: True COSINE_END_LR: 1e-6 WARMUP_START_LR: 1e-6 WARMUP_EPOCHS: 30.0 LR_POLICY: cosine MAX_EPOCH: 50 MOMENTUM: 0.9 WEIGHT_DECAY: 0.05 OPTIMIZING_METHOD: adamw
MODEL: NUM_CLASSES: 15 ARCH: mvit MODEL_NAME: MViT LOSS_FUNC: soft_cross_entropy DROPOUT_RATE: 0.5
TEST: ENABLE: False DATASET: mydata BATCH_SIZE: 64 NUM_SPATIAL_CROPS: 1 NUM_ENSEMBLE_VIEWS: 5
DATA_LOADER: NUM_WORKERS: 8 PIN_MEMORY: True
NUM_GPUS: 1 NUM_SHARDS: 1 RNG_SEED: 0 OUTPUT_DIR: "/home/mukhan/project/slowfast/output"
TENSORBOARD: ENABLE: True LOG_DIR: "/home/mukhan/project/slowfast/output/runs" # Leave empty to use cfg.OUTPUT_DIR/runs-{cfg.TRAIN.DATASET} as path. CLASS_NAMES_PATH: "/home/mukhan/project/slowfast/data/Mydata/classnames.json" # Path to json file providing class_name - id mapping. CONFUSION_MATRIX: ENABLE: True SUBSET_PATH: "/home/mukhan/project/slowfast/data/Mydata/classnames.txt" # Path to txt file contains class names separated by newline characters.
i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.
also why during the training lr is always 0.000
These are the graphs and confusion matrix i got after running 50 epochs
I understand that these results are not satisfactory. Could anyone of you please advise on how I can improve them? Specifically, I would like to know which parameters or aspects of the model training process I should consider adjusting to achieve better performance. Any suggestions or recommendations would be greatly appreciated.
i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.
also why during the training lr is always 0.000
answered here: https://github.com/facebookresearch/SlowFast/issues/664#issuecomment-2270738940
Also, I would immediately say batch size = 1 can be one of the limitations for a vision task regarding you dissatisfaction.
i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me. also why during the training lr is always 0.000
answered here: #664 (comment)
Also, I would immediately say batch size = 1 can be one of the limitations for a vision task regarding you dissatisfaction.
thanks for the advice, now i have increased the batch size from 1 to 2 and start the training again for 100 epochs i will update about the result here after training is done.
How big is your custom dataset? If also your dataset is limited, as relatively small ConvNet can also achieve the task. I understand training a transformer model can be resource-wise demanding, and you can try a batch size of 16, 32, or even 64 with a ConvNet to achieve a better performance because a batch size of 2 is still on the far low side.
i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.
also why during the training lr is always 0.000
Can you also show the output where it says your LR is 0.000? Looking at your config, your LR is set as 0.00001 and this value might have been simply clipped by the display precision in the output terminal, so it does not look concerning as your accuracy already is improving over time and it is not possible that you indeed have a zero LR, which would yield zero weight updates.
How big is your custom dataset? If also your dataset is limited, as relatively small ConvNet can also achieve the task. I understand training a transformer model can be resource-wise demanding, and you can try a batch size of 16, 32, or even 64 with a ConvNet to achieve a better performance because a batch size of 2 is still on the far low side.
i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me. also why during the training lr is always 0.000
Can you also show the output where it says your LR is 0.000? Looking at your config, your LR is set as 0.00001 and this value might have been simply clipped by the display precision in the output terminal, so it does not look concerning as your accuracy already is improving over time and it is not possible that you indeed have a zero LR, which would yield zero weight updates.
I have a synthetic dataset consisting of 15 classes of human activities. For each class, I have around 40 videos. My task is to train a vision transformer model for human activity recognition. After training, I will test it on a real-world dataset where I have 5 videos for each class.
While I understand that using a ConvNet might be more resource-efficient, my task is domain-specific and requires the use of a vision transformer model, regardless of the initial results. Therefore, I need to focus on improving the performance of the vision transformer rather than switching to a CNN.
Any suggestions for optimizing the vision transformer to achieve better results would be greatly appreciated
The simplest things you can start with are:
Update On my Training after 105 Epochs these are the below results i got on Kinetics/MVITv2_B_32x3
train_net.py: 759: training done: _p50.93_f225.17 _t11.09_m20.68 _a19.17 Top5 Acc: 60.00 MEM: 20.68 f: 225.1698
@alpargun due to resource limitations i cant increase my batch size from 2, should i continue this training for more epochs or should i try other models like Kinetics/MVITv2_S_16x4 or Kinetics/MVITv2_L_40x3_test ?
Also if i want to try SSv2 (https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md#ssv2) model which is pretrain on k400 what changes i need to do in my current dataset setteings ? Any suggestions to achieve better results would be greatly appreciated
@AbrarKhan009 You can start with the smallest version, MVITv2_S_16x4, as a baseline with a batch size as high as your hardware allows and continue the training until either training error converges or validation error starts increasing (overtraining). So please do not set 50 as the max epochs and keep it 200 as the original config. Do not worry, MVITv2_S can still handle your task to classify 15 actions since it had 81% top-1 accuracy on K400 with 400 different actions. This model also uses only 16 input frames, compared to your old model's 32 frames. So, I hope you can have a higher batch size and a faster training.
After a finished training, feel free to restore your last checkpoint for MVITv2_B_32x3 to directly continue your old training instead of starting from scratch to save time. So, you can compare the results for both transformer models.
Regarding trying SSv2 with the model pretrained on K400 can be a good way to test if the problem is due to your custom dataset. However, this will need you to download SSv2 and prepare the folder structure according to the implementation in the file slowfast/datasets/ssv2.py. Furthermore, you need to modify the config file so that the output classes (NUM_CLASSES) will match SSv2's number of classes (just like what you did for your custom dataset.
@AbrarKhan009 You can start with the smallest version, MVITv2_S_16x4, as a baseline with a batch size as high as your hardware allows and continue the training until either training error converges or validation error starts increasing (overtraining). So please do not set 50 as the max epochs and keep it 200 as the original config. Do not worry, MVITv2_S can still handle your task to classify 15 actions since it had 81% top-1 accuracy on K400 with 400 different actions. This model also uses only 16 input frames, compared to your old model's 32 frames. So, I hope you can have a higher batch size and a faster training.
After a finished training, feel free to restore your last checkpoint for MVITv2_B_32x3 to directly continue your old training instead of starting from scratch to save time. So, you can compare the results for both transformer models.
Regarding trying SSv2 with the model pretrained on K400 can be a good way to test if the problem is due to your custom dataset. However, this will need you to download SSv2 and prepare the folder structure according to the implementation in the file slowfast/datasets/ssv2.py. Furthermore, you need to modify the config file so that the output classes (NUM_CLASSES) will match SSv2's number of classes (just like what you did for your custom dataset.
Thanks for the suggestion. I will follow your advice. I tried training MVITv2_S_16x4 with batch sizes of 16 and 8, but both gave me a CUDA out of Memory error. Now, I’m using a batch size of 4, and it’s running smoothly.
@alpargun Hi good morning, my training of MVITv2_S_16x4 with batch sizes of 4 for 200 Epochs is done and these are the results i got : train_net.py: 759: training done: _p34.24_f64.46 _t2.96_m11.96 _a37.50 Top5 Acc: 69.17 MEM: 11.96 f: 64.4566
....
should i continue this training for more epochs? these results are goods as compare to the Base model thanks for your advice.
Hello everyone, i hope you all are doing well. i have successfully done slowfast training with MVitv2 on my custom dataset Details of my training are given below.
i used MVITv2_B_32x3.yaml and followed this structure below.
SlowFast/ ├── configs/ │ └── MyData/ │ └── MVITv2_B_32x3.yaml ├── data/ │ └── MyData/ │ ├── ClassA/ │ │ └── ins.mp4 │ ├── ClassB/ │ │ └── kep.mp4 │ ├── ClassC/ │ | └── tak.mp4 │ ├── train.csv │ ├── test.csv │ ├── val.csv │ └── classids.json ├── slowfast/ │ └── datasets/ │ ├── init.py │ ├── mydata.py │ └── ... └── ... all this fine-tuning guidance on your custom dataset is already explained by @AlexanderMelde [here] (https://github.com/facebookresearch/SlowFast/issues/149) thanks to him for his guidance.
My question is I am getting this output in the end, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 Can somebody explain this output to me, _p50.93_f225.17 _t12.31_m10.69 and MEM: 10.69 and f: 225.1698