Open mars747 opened 4 years ago
Hi @mars747,
To obtain the multigrid SS-v2 models we released in MODEL_ZOO, we indeed used the released SLOWFAST_8x8_R50.pkl (77.0 top-1) model as the pre-trained model for simplicity. It works because 3D CNNs are fully convolutional, and empirically I find them not very sensitive to the number of frames and temporal strides, especially when fine-tuning is performed.
I'd guess that one possible source of the issue could be related to data. Were you able to run the data processing steps described in slowfast/datasets/DATASET.md successfully? It might worth also double checking to see if the frames you used are consistent with the "frame list".
Thank you for the reply @chaoyuaw. I did follow the instructions and there were no errors (some warnings though). I think the list is consistent as the dataloader outputs warnings if files are not read in right?
I have the following training curve of training epoch top_1 error, does this look the same as yours?
If the training result is different, then it's more likely in the data. Otherwise, there's something wrong with my validation?
Thanks again!
Hi @mars747 , thanks for providing more context. I don't have access to the learning curves I had at this moment, so I can't really tell whether it's the same as what we had before. But qualitatively it looks reasonable to me. Have you ever tried to run only evaluation (without training) using the final, trained model from MODEL ZOO? If that gives a different top-1, then it might suggest data issues.
While indeed if a frame is missing, we should see warning messages, there could be other reasons that might cause data-related issues, e.g. different frame rates, different frame extraction qualities, etc. Maybe comparing the number of frames you have and the number of frames in the frame list, or eyeballing the frames themselves might help?
Also, another possibility to debug might be to turn off multigrid training to see if you are able to get the expected results.
@chaoyuaw @mars747 Hi,
I also tested SLOWFAST_16x8_R50_multigrid (no self-trained, just downloaded from the MODEL ZOO) and got this result which is lower than the reported result (63.5 | 88.7).
INFO] slowfast.utils.logging: 96: json_stats: {"split": "test_final", "top1_acc": "58.51", "top5_acc": "85.13"}
I just followed the dataset preparation by using ffmpeg -i "${video}" -r 30 -q:v 1 "${out_name}"
command.
The number of extracted all frames is 25,209,271.
Is the number of all frames the same as yours?
I used 8 GPUs and didn't modify the config.yaml and it ran 1,549 iterations.
Isn't there any problem with the data because I don't see any warning message?
[INFO: ssv2.py: 70]: Constructing Something-Something V2 test... [07/29 21:45:21][INFO] slowfast.datasets.ssv2: 70: Constructing Something-Something V2 test... [INFO: ssv2.py: 155]: Something-Something V2 dataloader constructed (size: 24777) from /Data_ssd/sth-sth-v2/20bn-something-something-v2-SF-frames/val.csv [07/29 21:45:30][INFO] slowfast.datasets.ssv2: 155: Something-Something V2 dataloader constructed (size: 24777) from /Data_ssd/sth-sth-v2/20bn-something-something-v2-SF-frames/val.csv [INFO: test_net.py: 147]: Testing model for 1549 iterations [07/29 21:45:30][INFO] test_net: 147: Testing model for 1549 iterations
My Environment setting :
Ubuntu 16.04 python3.7 pytorch 1.5.0 torchvison 0.6.0a0+82fd1c8
My command line :
python tools/run_net.py --cfg configs/SSv2/SLOWFAST_16x8_R50_multigrid.yaml DATA.PATH_TO_DATA_DIR /Data_ssd/sth-sth-v2/20bn-something-something-v2-SF-frames DATA.PATH_PREFIX /Data_ssd/sth-sth-v2/20bn-something-something-v2-SF-frames/ TEST.CHECKPOINT_FILE_PATH ./SLOWFAST_16x8_R50_multigrid.pkl TRAIN.ENABLE False TRAIN.CHECKPOINT_TYPE pytorch
Hi @youngwanLEE, @chaoyuaw,
My test result using 4 GPUs is the following:
[INFO: logging.py: 96]: json_stats: {"split": "test_final", "top1_acc": "58.44", "top5_acc": "85.14"} [07/24 18:08:42][INFO] slowfast.utils.logging: 96: json_stats: {"split": "test_final", "top1_acc": "58.44", "top5_acc": "85.14"}
I'm using the same command ffmpeg as well. The number of images I think I have is: 25209274
@mars747
Did you solve this problem?
@chaoyuaw
I'm sorry to bother you,
Wound you mind checking the SlowFast_16x8_R50_on_SSv2_standard model's accuracy?
I tried to reproduce the reported result(38.9) several times.
I created SSv2 frames several times but got .
I also follow your data instructions.
Thanks for your help in advance.
@mars747
Did you solve this problem?
@youngwanLEE , I did not investigate further and I have since started exploring other directions.
Hi @youngwanLEE,
I'll take a look and let you know the update, thanks for sharing the info. @chaoyuaw I'll take it from there. Thanks,
@mars747 @chaoyuaw @takatosp1
By using TEST.NUM_SPATIAL_CROPS 3 instead of 1,
I got 63.93/88.16.
Didn't the reported result (63.0/88.5) come from using NUM_SPATIAL_CROPS 3?
@youngwanLEE I have got the similar results as your reported For NUM_SPATIAL_CROPS=1:
json_stats: {"split": "test_final", "top1_acc": "57.88", "top5_acc": "84.13"}
For NUM_SPATIAL_CROPS=3:
json_stats: {"split": "test_final", "top1_acc": "63.92", "top5_acc": "88.15"}
Hi,
I'm trying to reproduce the multigrid result on ssv2. The config I'm using is: configs/SSv2/SLOWFAST_16x8_R50_multigrid.yaml
The pretrained checkpoint I'm using is: SLOWFAST_8x8_R50.pkl per: https://github.com/facebookresearch/SlowFast/blob/master/MODEL_ZOO.md
Is the config and checkpoint matched correctly? Per my understanding of the paper
If the number of frames processed by the slow pathways are different, then the config SLOWFAST_16x8_R50_multigrid cannot possibly use a 8x8 pretrained model?
With this mismatched configuration, I'm only able to achieve "min_top1_err": 43.042776, "min_top5_err": 15.907990
Thank you for your explanation!