Multiple issues in the dataset.

Audio

There is a disturbance in audio which would have affected the audio features.

Few Examples: dia793_utt0.mp4 dia164_utt5.mp4 dia682_utt1.mp4 dia529_utt2.mp4 dia1029_utt1.mp4 dia1008_utt1.mp4

Mostly all videos with size > 2.5 MB (around 200 videos in train_set)

Video and text are not matching.

For example

a) dialogue 241. In utterance 1 the sync breaks between the text and the video utterance 2 in text is "I asked him." while video dia241_utt2.mp4 has just word "now" and the sync issues goes on.

b) dialogue 757 utterance 7 is also not synced with the text.

c) diaglogue 485 utterance 0 in text "Hey, this- Heyy..." but the video is a long clip.

There are many more video-text sync issues.

Is this dataset usable? Please help me with this.

declare-lab / MELD

Multiple issues in the dataset. #9