cmhungsteve / TA3N

[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)
https://arxiv.org/abs/1907.12743
MIT License
259 stars 41 forks source link

question about num_segments #18

Closed jlim13 closed 4 years ago

jlim13 commented 4 years ago

hi,

for any given iteration, does the network use num_segments number of frames to classify a video?

cmhungsteve commented 4 years ago

Yes.

jlim13 commented 4 years ago

thanks for the prompt response. i have not worked in action classification before, but all it takes is 5 (or some number considerably less than the entire video sequence) to classify the video?

cmhungsteve commented 4 years ago

You can tune that number as you wish, depending on your applications or tasks. Different types of videos have different best num_segments. However, to have a fair comparison with other methods, people usually fix num_segments.

jlim13 commented 4 years ago

I see. When I set num_segments to something really high like 100-300, my script just crashes. Can you give some insight as to what is going on? Maybe this Ta3N isn't suitable for my problem if I need the entire input sequence to properly classify my input.

Thanks!

cmhungsteve commented 4 years ago

I guess the reason could be the computation issue. TA3N computes relations between frames, and the numbers depend on num_segments. If num_segments is really high (e.g. the entire input sequence), I think you need lots of GPU resources. Otherwise, you may also develop some tricks to reduce the computation, and then you and still apply the main concept of TA3N to your case.