escorciav / deep-action-proposals

Action Proposals generated by deep models
MIT License
28 stars 6 forks source link

DAPs paper understanding problem #13

Open zycironic opened 6 years ago

zycironic commented 6 years ago

I don't understand how much proposals are generated in all in one video.During training, only one stream is processed in one video,so the overall number of proposals is K.Is that true?

escorciav commented 6 years ago

Thanks for your interest in our work.

K is the nunmber of proposals after reasoning about a segment of length T. In DAPs, we slide the model and applied on multiple chunks of length T. If you need more proposals, you can reduce the striding. Effectively, you are still processing the video stream only once.

zycironic commented 6 years ago

If I slide the model for n times, there will be n*K anchor segments matching the prior K anchor segments?

escorciav commented 6 years ago

I'm sorry for your confusion.

The anchors spanned by the model belongs to the time interval T. If you slide it by delta, the following K predictions will be in the interval [delta, T+delta]. In other word, the anchors are parametrized in terms of T.

Please, take a look at the inference code. It's simple to understand there.

TianweiXing commented 6 years ago

Hello @escorciav , I 'm curious about how you prepare your training data. In the network, the structure is pre-defined (the output dimension is always K for giving K proposals). However, during the training, for a audio clip of T, what if the number of activities is less than K? How do you assign the ground truth label? Thanks a lot. And this work is really interesting.