TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
https://arxiv.org/abs/2304.08345
MIT License
260 stars 16 forks source link

AssertionError when calculating BLEU score #15

Closed thechargedneutron closed 5 months ago

thechargedneutron commented 1 year ago

Thanks for the code and documentation. I am running the captioning finetuning experiment on MSRVTT. During the evaluation stage, the code stops with an AssertionError here. Seems like hypo variable contains repetition of the same sentence multiple times. Can you please tell if I missed any step of if not, why is this error coming and how to solve it?

Here is the generated hypo variable and the ref variable output for video9894:

['in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera', 'in the room a man in red was talking to the camera']
['a boy is talking to his roomnates who are in different room', 'a man asks another man to help him with chores', 'a man avoids helping his roommates', 'a man doesn t help his friends with anything', 'a man drinking some beer', 'a man walking in an apartment', 'a person communicating with other person', 'he was in the kitchen', 'man asking another man to do the dishes', 'man refusing to help his roommate', 'roommate continues to say no each time he is asked to help with something', 'the boys meet courier boy', 'the man asks for help', 'the youtube nigahiga doesn t want to help anyone', 'two friends are having fun', 'two men are talking in a kitchen', 'two young men talking to each other about doing dishes', 'a man walking in an apartment', 'a man drinking some beer', 'roommate continues to say no each time he is asked to help with something']

Thanks!

TXH-mercury commented 1 year ago

Hi @thechargedneutron , this is because that in valor's code when loading a data sample meets some wrong, it will automatically random choose another sample for replacement. THis process should only happened in training process but in the code i don't restrict it. So when testing meeting some wrong samples. it will randomly choose another sample from test set which cause dupilicate validation on chosen samples, results in the captioning bug.

Solution:

https://github.com/TXH-mercury/VALOR/blob/7a047df9ab4f2607dcc4d4b861c32b39a959a803/data/data.py#L369

changing from 'if video_pixels is None:' to 'if video_pixels is None and self.training:'

https://github.com/TXH-mercury/VALOR/blob/7a047df9ab4f2607dcc4d4b861c32b39a959a803/data/data.py#L377

changing from 'if audio_spectrograms is None:' to 'if audio_spectrograms is None and self.training:'

At this time, wrong samples in testing time will report a bug instead of sesearching for a replaced one, and you could fix the (video/audio) data according to the bug hint. To view the real bug information, you could comment out the 'try except' at https://github.com/TXH-mercury/VALOR/blob/d616e97687f1c2f402a80a945c6cbab4f008297d/data/data.py#L179

I will fix this in the latest code, thanks for pointing it.