Soldelli / MAD

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
MIT License
149 stars 3 forks source link

Eval with VLG-Net #5

Closed MonsisGit closed 1 year ago

MonsisGit commented 1 year ago

Hi, I was wondering how you keep evaluation consistent with VLG-Net using the MAD dataset. Are you randomly sampling windows around their corresponding moments, e.g. for Fig.4 (Performance trend across different windows lengths) or Table 3 (Short Video Setup) from your paper? It seems like in mad.py you are randomly sampling different windows every time for evaluation?

Cheers

Soldelli commented 1 year ago

Dear @MonsisGit , random sampling of windows is only applied in training. At inference time the data loser provides windows for the entire movie with a half window overlap. (Simple sliding window)

I suggest you use ipdb and verify this detail at inference time. No randomness is involved in the evaluation process.

Let me know if I can assist you further.