Question about TSM accuracy in ablation study

jinhaoduan commented 2 years ago

Hi, thanks for your awesome work in video recognition and also the release.

Recently, I tested the pretrained ssthv1 811 checkpoint by executing the provided test command, and only got 49.9% top-1 precision, which is a little bit far from the reported 52.3%. It bothered me a lot and the dataset in my drive seems fine. So could you please help me to figure it out? By the way, I am using torch1.9.

Besides, the ablation studies in your paper show that TSM with 8 RGB frames (center crop * 1 clip) could achieve 47.1% top-1 precision, which is higher than the original paper (also higher than their released pretrained model). So do you train this model by yourself? Could you please give me more specific details? Many thanks!

jinhaoduan commented 2 years ago

It turns out that there are some issues with my dataset. I have fixed it. The pretrained model manage to achieve the reported score. So the only problem would be the reported TSM accuracy. Many thanks.

yztongzhan commented 2 years ago

We use flip aug for SSV1 training, which also support in MMaction2 for such TSM, GST and GSM. We want to point out that our label generation script is different from MMaction2, you need to check your label carefully. I hope these could be useful for you.

jinhaoduan commented 2 years ago

@yztongzhan Thanks for your reply. It helps a lot.

MCG-NJU / TDN

Question about TSM accuracy in ablation study #40