Closed zhulishun closed 4 years ago
Hi @zhulishun, Perhaps you could give a more detailed description of "action of multiple people". Just predict the single label for the whole frame or multiple labels for each individual?
There are multiple people in a frame, and everyone has their own action, these actions may be consistent or inconsistent
It seems the same setting as action detection \eg on AVA which provides labels for one frame per second, with every person annotated with a bounding box and (possibly multiple) actions. In our own experiments, TPN actually obtained some gains on such tasks. We would release this part in the future work. Stay tuned.
TPN can recognize the action of multiple people in a frame?