alibaba-mmai-research / TAdaConv

[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
https://tadaconv-iclr2022.github.io
Apache License 2.0
226 stars 31 forks source link

Some questions on using _decode_image rather than _decode_video #3

Closed jwfanDL closed 2 years ago

jwfanDL commented 2 years ago

Hi! @huang-ziyuan We are ready to follow your excellent work and reproduce some results. Your code is well structured as well as clean and neat. Thank you so much.

You rewrite the dataset object, which is different from TSM codebase. And our dataset has already split and saved in another type. Therefore, we may use your _decode_image method rather than _decode_video method.

We have several questions which maybe important for us: i) Can _decode_image obtain similar results as _decode_video? ii) We want to know some important operations in _decode_video and notice them when implementing.

huang-ziyuan commented 2 years ago

Thanks for using the code.

Yes, our dataset class is designed for directly reading from videos. Despite that it is capable of reading data from images, this function is designed for MoSI to read ImageNet data. We have not tried to read video data with _decode_image.

So, (i) It depends, since different encoding method (image/video) can result in different decoded data. But you are welcome to try. (ii) Our _decode_video is simply based on Decord. It outputs an video object and the frames are obtained by feeding the index to it. So the important operation is the sampling strategy, which is _segment_based_sampling for datasets like Something-Something and _interval_based_sampling for Kinetics.

jwfanDL commented 2 years ago

Thank you so much for your reply~