Thank you for your code! My confusion is when I try to use different seeds to run the code, I found the result for MSVD (MSRVTT is just okay), especially the cider, varies largely (+-5%), which I think is not ignorable. Have you ever experienced that? In my view, it is because MSVD samples 17 captions out of nearly 40 captions, which causes big randomness using different seeds, but when I use the fixed 17 captions for each video, things are nearly the same (7% in Cider). What are the possible reasons for that?
Hi,
Thank you for your code! My confusion is when I try to use different seeds to run the code, I found the result for MSVD (MSRVTT is just okay), especially the cider, varies largely (+-5%), which I think is not ignorable. Have you ever experienced that? In my view, it is because MSVD samples 17 captions out of nearly 40 captions, which causes big randomness using different seeds, but when I use the fixed 17 captions for each video, things are nearly the same (7% in Cider). What are the possible reasons for that?
Thank you!