assembly-101 / assembly101-action-anticipation

Code and models for the Action Anticipation benchmark of Assembly101
3 stars 1 forks source link

How to extract TSM features (8 frames as input)? #1

Open tokisakikaguya opened 9 months ago

tokisakikaguya commented 9 months ago

Hello, I noticed that you wrote "TempAgg requires per-frame features as input. TSM (8-frame input) has been used for extracting 2048-D per-frame features" in the Feature section. I would like to know in what form the 8 frames are input during the actual feature extraction process, as well as the more specific single-frame feature extraction strategy. Can you please give me an answer? Thanks!

dibschat commented 9 months ago

Hi Tokisaki,

Please consider referring to issues #1 and #6 of the action recognition repo.

TL;DR: We use the 2D backbone of a pretrained TSM (trained on video with shifting operations) for extracting the per-frame features. We repurpose the RU-LSTM feature extraction script, by replacing TSN with TSM and using TSM image transforms.

Please let us know if you face any issues.

Best, Dibyadip

tokisakikaguya commented 9 months ago

Hi! Thank you very much for your reply! I have learned the solution to the problem through your reply! Best, Tokisak