Closed 4lparslan closed 2 years ago
For now, we only support action classification, not action detection. So the output will just be a string (class label). If you need bounding box on video, an (spatiotemporal) action detection model is the way to go.
I get predicted class as an string but i want to see it with bounding box on video. Any idea?