SRA2 / SPELL

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection (ECCV 2022)
MIT License
64 stars 9 forks source link

Regarding Feature Extraction #7

Open hars-singh opened 1 year ago

hars-singh commented 1 year ago

Could you please give more detailed step by step information how you extracted features?

Thanks

kylemin commented 1 year ago

We followed Forward STE to extract the audio-visual features. I think the ASC repository has detailed information. For information about TSM, please refer to the Note section.