Evaluating performance on a single video

How would you evaluate the performance of the models on each videos?

I want to take a look at some relatively good and bad matches of the video and caption, but I don't understand how the video ids and the caption ids are related to the label matrix in evaluation.py.

In evaluation.py:i2t_varied(error_matrix), it convertes the error matrix to a label matrix of size #caption_embs * #video_embs. I was assuming the order of the label martix is representing the caption ids, but the caption id are of the format videos_xxxx#captions_nn.

danieljf24 / dual_encoding

Evaluating performance on a single video #18