OpenGVLab / VideoMAEv2

[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
https://arxiv.org/abs/2303.16727
MIT License
445 stars 45 forks source link

Where to find the script of finetuning on 'Temporal action detection' task? #51

Closed Leo-Yuyang closed 6 months ago

Leo-Yuyang commented 6 months ago

Thank you for your amazing work. I was trying to use VideoMAEv2 to do a temporal action detection task. But I can't find the detailed implementation related to this task. Could you please give me some advice?

congee524 commented 6 months ago

https://github.com/OpenGVLab/VideoMAEv2/blob/master/docs/TAD.md

you could refer to the above doc. we use actionformer codebase with offline extracted videomae v2 features.

Leo-Yuyang commented 6 months ago

Thank you for your prompt response! I was just about to send you this email, lol: 49d20e1b25627c28d3abf5c2cc1d1d36 Anyway, I have learned how to extract features using videomae v2. However, this link below is so general and not specific enough: image Could you please give me more guidance? For example, after possessing the extracted features, which script in Actionformer should I refer or use to do the next inference step? Thank you for your support!

congee524 commented 6 months ago

This part of the experiment was not done by me, and I'm not sure of the exact details. But I think you could try to figure out the framework code of action former (it should be an important work in tad), or you could ask for help from other people who mentioned in other issues that they were working on tad tasks.

Leo-Yuyang commented 6 months ago

Thank you for your explanation and work!

congee524 commented 6 months ago

Nothing~

myccver commented 1 month ago

Thank you for your prompt response! I was just about to send you this email, lol: 49d20e1b25627c28d3abf5c2cc1d1d36 Anyway, I have learned how to extract features using videomae v2. However, this link below is so general and not specific enough: image Could you please give me more guidance? For example, after possessing the extracted features, which script in Actionformer should I refer or use to do the next inference step? Thank you for your support!

Thank you for your prompt response! I was just about to send you this email, lol: 49d20e1b25627c28d3abf5c2cc1d1d36 Anyway, I have learned how to extract features using videomae v2. However, this link below is so general and not specific enough: image Could you please give me more guidance? For example, after possessing the extracted features, which script in Actionformer should I refer or use to do the next inference step? Thank you for your support!

Thank you for your prompt response! I was just about to send you this email, lol: 49d20e1b25627c28d3abf5c2cc1d1d36 Anyway, I have learned how to extract features using videomae v2. However, this link below is so general and not specific enough: image Could you please give me more guidance? For example, after possessing the extracted features, which script in Actionformer should I refer or use to do the next inference step? Thank you for your support!

你好,请问你用代码可以成功提取特征吗?