gabeur / mmt

Multi-Modal Transformer for Video Retrieval
http://thoth.inrialpes.fr/research/MMT/
Apache License 2.0
259 stars 41 forks source link

missing speech features for LSMDC dataset #13

Closed boalantvxqpku closed 3 years ago

boalantvxqpku commented 3 years ago

Hi, thanks for sharing the code.

I noticed that the speech features are missing for all video clips in the LSMDC.tar.gz. But the paper mentioned that image

I watched some original video clips from LSMDC dataset and found that they are all with audio where speech transcripts can be extracted using the Google Cloud Speech to Text API.

Therefore, my question is that did you train the MMT model with speech features on LSMDC dataset. If so, what's the result and would you please share the speech feature files? If not, why did't you utilize the speech features?

I would appreciate your reply.

gabeur commented 3 years ago

We used the features provided by the authors of Collaborative Experts. Figure 1 of their paper shows that they did not extract speech features for LSMDC even if the audio is present. Probably it is not fair to use speech for the LSMDC dataset because some captions where created from the movie script and therefore contain the video speech.

That is why it is missing from the LSMDC.tar.gz file and therefore our model does not use speech on that dataset, we should correct that in the paper.