Closed thechargedneutron closed 6 months ago
Hi @thechargedneutron , this is because that in valor's code when loading a data sample meets some wrong, it will automatically random choose another sample for replacement. THis process should only happened in training process but in the code i don't restrict it. So when testing meeting some wrong samples. it will randomly choose another sample from test set which cause dupilicate validation on chosen samples, results in the captioning bug.
Solution:
https://github.com/TXH-mercury/VALOR/blob/7a047df9ab4f2607dcc4d4b861c32b39a959a803/data/data.py#L369
changing from 'if video_pixels is None:' to 'if video_pixels is None and self.training:'
https://github.com/TXH-mercury/VALOR/blob/7a047df9ab4f2607dcc4d4b861c32b39a959a803/data/data.py#L377
changing from 'if audio_spectrograms is None:' to 'if audio_spectrograms is None and self.training:'
At this time, wrong samples in testing time will report a bug instead of sesearching for a replaced one, and you could fix the (video/audio) data according to the bug hint. To view the real bug information, you could comment out the 'try except' at https://github.com/TXH-mercury/VALOR/blob/d616e97687f1c2f402a80a945c6cbab4f008297d/data/data.py#L179
I will fix this in the latest code, thanks for pointing it.
Thanks for the code and documentation. I am running the captioning finetuning experiment on MSRVTT. During the evaluation stage, the code stops with an AssertionError here. Seems like
hypo
variable contains repetition of the same sentence multiple times. Can you please tell if I missed any step of if not, why is this error coming and how to solve it?Here is the generated
hypo
variable and theref
variable output for video9894:Thanks!