They propose SF-Net, which predicts the beginning and end of an action based on a single frame (image). The system not only predicts the class of the action in each frame and the whole video, but also predicts the "action" or "no action" as well. It can be used as an annotation tool.
TL;DR
They propose SF-Net, which predicts the beginning and end of an action based on a single frame (image). The system not only predicts the class of the action in each frame and the whole video, but also predicts the "action" or "no action" as well. It can be used as an annotation tool.
Why it matters:
Paper URL
https://arxiv.org/abs/2003.06845
Submission Dates(yyyy/mm/dd)
2020/03/15
Authors and institutions
Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
Methods
Results
Comments
ECCV2020