amazon-science / long-short-term-transformer

[NeurIPS 2021 Spotlight] Official implementation of Long Short-Term Transformer for Online Action Detection
Apache License 2.0
125 stars 19 forks source link

About the activitynet features. #14

Closed sqiangcao99 closed 2 years ago

sqiangcao99 commented 2 years ago

Hi,

image

For ActivityNet pre-trained model, which two configuration files are used for rgb and flow?

xumingze0308 commented 2 years ago

We used the "clip_rgb" ones. You can also directly use the feature from TeSTra.

sqiangcao99 commented 2 years ago

@xumingze0308. Hi, thanks for your help. Currently, I have some proplems reproducing the resulting the results(6% lower) on TVSeries using the ActivityNet pretrained features. Specifically,

  1. I extract the rgb frames with resolution of short-side 320 and flows with resolution of 340X256;
  2. The rgb frames and flow frams are preprocessed according to the config file of mmaction2, which is

    # RGB
    data_pipeline = [
        dict(type='RawFrameDecode'), 
        dict(type='CenterCrop', crop_size=256),
        dict(type='Normalize', **args.img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW'),
        dict(type='Collect', keys=['imgs'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs']), 
    ]
    
    data_pipeline = [
        dict(type='RawFrameDecode'),
        dict(type='Resize', scale=(-1, 256)),
        dict(type='TenCrop', crop_size=224),
        dict(type='Normalize', **args.img_norm_cfg),
        dict(type='FormatShape', input_format='NCHW_Flow'),
        dict(type='Collect', keys=['imgs'], meta_keys=[]),
        dict(type='ToTensor', keys=['imgs'])
    ]
  3. The extracting code is modified based on https://github.com/open-mmlab/mmaction2/blob/master/tools/data/activitynet/tsn_feature_extraction.py.

Is the whole process correct?

xumingze0308 commented 2 years ago

For both rgb and flow, we don't conduct CenterCrop, but directly use the default resolution.