The function CLIPModelOutput.get_out_to_loss_grad outputs a [batch_size]-shaped tensor, but it should be (for consistency with other methods and compatibility with saver.current_store["out_to_loss"]) [batch_size, 1]-shaped. All other models have a unsqueeze(-1) at the end, for CLIP it seems to be missing.
The function
CLIPModelOutput.get_out_to_loss_grad
outputs a[batch_size]
-shaped tensor, but it should be (for consistency with other methods and compatibility withsaver.current_store["out_to_loss"]
)[batch_size, 1]
-shaped. All other models have aunsqueeze(-1)
at the end, for CLIP it seems to be missing.