Open Mrbishuai opened 2 years ago
Hi, @Mrbishuai
The performance of 81.1% for R3D is averaged over 3 splits on UCF101 dataset. If you check the manuscript of our paper, "Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning", you can find that the corresponding performance on UCF101 split 1 is 79.9%.
I am not sure what kind of experimental settings in your case.
But the gap in your case is already very small. If the experimental environments are the same, you can try to modify some hyper-paramters. In my case, I actually do not explore much about the hyper-parameters for training, and the settings for my three baselines are almost the same, without specially tuning for each. There would be some room for improvement.
Hi,@BestJuly
Thank you for your reply Before running the code, I didn't modify it. I also carefully checked the details of the paper and the code parameters. I think it's right to do so. The accuracy is 77.3% in split1 of ucf101 and 79.5% in three splits.
I want to know from the paper "self supervised video representation using pretext contractual learning" to "pretext contractual learning: towards good practices in self supervised video representation learning" Whether there are some improvements, or if the error range allows, the result is right.
Hi, @Mrbishuai
Thank you for your interest and your information.
Considering the performance, I actually use different folders for different baselines. To make it open-sourced and clear, I plan to combine them into one repo. There might be some differences, and I may need to run again to check. I think 2% gap is not big nor small.
The differeces between these two papers are mainly based on some training strategy and tricks.
As to the performance part, I may check and run the experiments again. However, I am busy with graduation these days, and I am afraid I will reply to you late for that part.
Hi, @BestJuly You are very kind. Thank you for replying to me in your busy schedule. I did find a further improvement in the accuracy from "Self supervised video representation using pretext contractual learning" to "Pretext contractual learning: towards good practices in self supervised video representation learning", but I don't know what techniques were used. I think 2% may take me a lot of effort. I hope you will tell me when you are free. Finally, I wish you a smooth graduation in the coming days. Best wishes to you.
Thank you.