Closed wangxiao5791509 closed 3 years ago
Hi @wangxiao5791509, many thanks for your message and your interest in our work!
In our work, rather than considering a single window, we consider multiple windows of varying size. Eq. 3 refers to those multiple windows - they can either have a fixed number of events per window (left term) or the windows can have a fixed time length (right term).
Each of those windows is then separately processed (reconstructing the frame-based images and then running NetVLAD to obtain a feature descriptor).
Finally, the ensemble fusion happens in Eqs. 5+6.
Let me know if you have any further questions.
@Tobias-Fischer Thanks for your explanation. I understand this procedure now.
Hi, thanks for your wonderful work on event representation. But I still can't understand how to get the fused event representation. That is to say, how the Eq. (3) is computed?
According to the algorithm described in Section III, I can't find related context for this part. Thanks for your attention and looking forward to your reply.