facebookresearch / pytorchvideo

A deep learning library for video understanding research.
https://pytorchvideo.org/
Apache License 2.0
3.28k stars 404 forks source link

Unexpected probabilities with tutorial adapted for X3D model #124

Open hendu25 opened 2 years ago

hendu25 commented 2 years ago

During my experimentation with pytorchvideo, I've tried to adapt the current tutorial for inference to use the X3D model with the code on torch hub

I also adapted the model to display the top5 labels along with their probabilities, but saw that all the probabilities were close to zero for a video where tutorial where the orginal

probs  = preds.topk(k=5).values.squeeze().tolist()
predText = [f"{c} ({p:.2f})" for c,p in zip(pred_class_names, probs)]
print("Top 5 predicted labels: %s" % predText)

Top 5 predicted labels: ['archery (0.01)', 'air drumming (0.00)', 'applauding (0.00)', 'applying cream (0.00)', 'abseiling (0.00)']

The original tutorial with the slowfast_r50 model reported the following results Top 5 predicted labels: ['archery (1.00)', 'throwing axe (0.00)', 'playing paintball (0.00)', 'disc golfing (0.00)', 'riding or walking with horse (0.00)']'

The edited version of the notebook using x3d is attached below facebookresearch_pytorchvideo_x3d_prob.zip

wojiaohumaocheng commented 2 years ago

I meet same problem

kalyanvasudev commented 2 years ago

@lyttonhao , could you please take a look at this? Thanks!

hendu25 commented 2 years ago

After further digging into this, I think the difference is that the X3D model includes a softmax at the end while the slowfast model does not. Therefore getting the probabilities with X3D does not require this step preds = post_act(preds)

Evidence

model_x3d = torch.hub.load('facebookresearch/pytorchvideo', "x3d_s", pretrained=True)
list(model_x3d.modules())[-5:]
model_sf =  torch.hub.load('facebookresearch/pytorchvideo', "slowfast_r50", pretrained=True)
list(model_sf.modules())[-5:]

Not sure if this requires any documentation update, or if an update it makes sense to extend the tutorial to include showing the prediction probability