Hello, i'm new in this field and I'm a bit confused about how to calculate the metric on the MSRVTT set, when each video will have 20 corresponding descriptive captions. So how do we calculate to get the correlation matrix between captions and videos because the number of videos in the test set is only 2990 and the number of captions is 2990x20=59800, I have read your code but I really haven't seen it yet understand the core point here. Hope you can explain this to me
Hello, i'm new in this field and I'm a bit confused about how to calculate the metric on the MSRVTT set, when each video will have 20 corresponding descriptive captions. So how do we calculate to get the correlation matrix between captions and videos because the number of videos in the test set is only 2990 and the number of captions is 2990x20=59800, I have read your code but I really haven't seen it yet understand the core point here. Hope you can explain this to me