BestJuly / Pretext-Contrastive-Learning

Official codes for paper "Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning".
13 stars 2 forks source link

The recognition accuracy can not reach the best #2

Open Mrbishuai opened 2 years ago

Mrbishuai commented 2 years ago
I'm glad to read your paper, which inspired me a lot. I want to know if your paper has been accepted, because I urgently want to get your code to learn. Recently, when I run your code, the result is always unsatisfactory. The best result is 79.5%(r3d), which is not small compared with your 81.1%. 
I know your paper "Self-Supervised Video Representation Using Pretext-Contrastive Learning" the accuracy is 79.5%, but your latest paper "Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning" can achieve 81.1%. I want to know where the error occurred. I hope you have time to tell me what's wrong.

Thank you.

BestJuly commented 2 years ago

Hi, @Mrbishuai

The performance of 81.1% for R3D is averaged over 3 splits on UCF101 dataset. If you check the manuscript of our paper, "Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning", you can find that the corresponding performance on UCF101 split 1 is 79.9%.

I am not sure what kind of experimental settings in your case.

But the gap in your case is already very small. If the experimental environments are the same, you can try to modify some hyper-paramters. In my case, I actually do not explore much about the hyper-parameters for training, and the settings for my three baselines are almost the same, without specially tuning for each. There would be some room for improvement.

Mrbishuai commented 2 years ago

Hi,@BestJuly

Thank you for your reply Before running the code, I didn't modify it. I also carefully checked the details of the paper and the code parameters. I think it's right to do so. The accuracy is 77.3% in split1 of ucf101 and 79.5% in three splits.

I want to know from the paper "self supervised video representation using pretext contractual learning" to "pretext contractual learning: towards good practices in self supervised video representation learning" Whether there are some improvements, or if the error range allows, the result is right.

BestJuly commented 2 years ago

Hi, @Mrbishuai

Thank you for your interest and your information.

Considering the performance, I actually use different folders for different baselines. To make it open-sourced and clear, I plan to combine them into one repo. There might be some differences, and I may need to run again to check. I think 2% gap is not big nor small.

The differeces between these two papers are mainly based on some training strategy and tricks.

As to the performance part, I may check and run the experiments again. However, I am busy with graduation these days, and I am afraid I will reply to you late for that part.

Mrbishuai commented 2 years ago

Hi, @BestJuly You are very kind. Thank you for replying to me in your busy schedule. I did find a further improvement in the accuracy from "Self supervised video representation using pretext contractual learning" to "Pretext contractual learning: towards good practices in self supervised video representation learning", but I don't know what techniques were used. I think 2% may take me a lot of effort. I hope you will tell me when you are free. Finally, I wish you a smooth graduation in the coming days. Best wishes to you.