cmhungsteve / TA3N

[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)
https://arxiv.org/abs/1907.12743
MIT License
259 stars 41 forks source link

Is it possible to make the feature encoders learnable? #33

Closed avijit9 closed 3 years ago

avijit9 commented 3 years ago

Hi,

Thanks for this awesome codebase!

I would like to train the feature encoders also while reading the data from *.jpgs. Is it possible with this codebase?

cmhungsteve commented 3 years ago

Yes, it should be doable. My codebase is built by modifying TRN-pytorch (https://github.com/zhoubolei/TRN-pytorch). In TRN-pytorch, the input data are RAW videos. You can check their codes for more details. I mainly modified models.py and dataset.py to make the codebase loaded from pretrained features. I believe you can do it in an inverse way.

avijit9 commented 3 years ago

Thanks a lot for your prompt reply! I really appreciate it.

In your ICCV'19 paper, you don't train your feature encoders. Is my understanding correct?

I have one more silly question to ask. If I want to plug-in I3D, C3D models, where should I begin from? Does the C3D code in this repo works?

cmhungsteve commented 3 years ago

Your understanding is correct. I directly use pre-trained features as the input. The backbone architectures are not in my codebase. If you don't want to modify the codebase too much and want to use features from other backbones like I3D, C3D, etc., you need to use other codes to extract frame-level features (e.g. https://github.com/ahsaniqbal/Kinetics-FeatureExtractor). If you want to make this codebase end-to-end trainable from RAW videos, you need to modify it as mentioned previously (e.g. check TRN-pytorch).