farewellthree / STAN

Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"
Apache License 2.0
90 stars 3 forks source link

STAN Action Recognition config file example (HMDB51) #15

Open FransHk opened 7 months ago

FransHk commented 7 months ago

Added an mmaction2 configuration for Action Recognition using the STAN model. This configuration file is missing from the existing configs (only contains retrieval configs). The config was constructed by using information provided in the paper. This should help other people reproduce AR results (kinetics, hmdb51, etc).

The config is completely analogous to the retrieval configs other than: task="recognition" loss = dict(type="CrossEntropyLoss") dataset_type = "VideoDataset" val_evaluator = dict(type="ZeroShotAccMetric")

Optionally (commented out), added a wandb hook for those interested in logging results: visualizer=dict(type="Visualizer", vis_backends=[dict(type="WandbVisBackend", init_kwargs=dict(project="STAN"))])

Somehow, the model expects a 'text' dict in dataset, although this is not used for the recognition task. Therefore, the default MMAction2 'VideoDataset' class has a 'dummy_text' parameter that appends a blank string to each sample. Personally I think it would be even better to remove this requirement for the recognition task but that is outside of the scope of this PR.

HMDB51 STAN train results:

image