OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Apache License 2.0
1.29k stars 83 forks source link

The issue of Temporal-Action-Localization #2

Open typ1012 opened 1 year ago

typ1012 commented 1 year ago

Hello, thanks for your good work! I encountered some problems when reproducing the performance of Temporal-Action-Localization task: Thumos14: 69.11 average mAP (lower than 71.58). The input_dim of feature is not right. (The channel dimensionof provided mae feature is 1280 but 1408 in thumos.yaml) Anet1.3: 38.56 average mAP.

Richard-61 commented 1 year ago

Hi, the features we release are from the VideoMAE, and the results you reproduce are correct. Concatenating with UniformerV2 features will get a better performance shown in the paper, and the features from UniformerV2 will be released soon.

tensorboy commented 1 year ago

Hi @typ1012 @Richard-61 I tried to run Thumos14, but the map is only 47 (I merely changed the input_dim from 1408 to 1280). I'm wondering whether you have other modifications in order to reproduce the 69.11 mAP.

Value-Jack commented 1 year ago

Hello, thanks for your good work! I encountered some problems when reproducing the performance of Temporal-Action-Localization task: Thumos14: 69.11 average mAP (lower than 71.58). The input_dim of feature is not right. (The channel dimensionof provided mae feature is 1280 but 1408 in thumos.yaml) Anet1.3: 38.56 average mAP.

Hi,bro,I also want to know how do you reproduce the 69.11mAP,I only could get bad show

Value-Jack commented 1 year ago

@tensorboy @typ1012 have you solve your problem?

Richard-61 commented 1 year ago

@tensorboy @typ1012 have you solve your problem?

I fixed the code for the bug of batch_nms.

Value-Jack commented 1 year ago

so? do you reproduced the result 71.58?

Value-Jack commented 1 year ago

from petrel_client.client import Client ModuleNotFoundError: No module named 'petrel_client',could you please tell me how to import this module?

shepnerd commented 1 year ago

That is a module to load videos on our servers. It may not be applicable in your case. You can remove it and update the corresponding video loading functions. We will fix it soon. @Richard-61

Value-Jack commented 1 year ago

could you please explain the code in /InternVideo-main/Downstream/Temporal-Action-Localization/configs/thumos.yaml

I don't know the meaning of the 1408 input_dim: 1408,

2048+768, 1024+768,1280+1024

christian-matroid commented 1 year ago

Hello, thanks for your good work! I encountered some problems when reproducing the performance of Temporal-Action-Localization task: Thumos14: 69.11 average mAP (lower than 71.58). The input_dim of feature is not right. (The channel dimensionof provided mae feature is 1280 but 1408 in thumos.yaml) Anet1.3: 38.56 average mAP.

Hello @typ1012, I have been trying to reproduce the Anet1.3 scores as you mentioned you did, but have not been able to get better than 32.23. I have to use my own clone of the ActionFormer repository to accomplish this, as the InternVideo's downstream copy of the repository has many issues. Could you share the steps you used to produce this reported Anet1.3 score? Thank you!