Yui010206 / SeViLA

[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
https://arxiv.org/abs/2305.06988
BSD 3-Clause "New" or "Revised" License
178 stars 21 forks source link

the results on STAR is lower than paper #10

Open LoverLost opened 1 year ago

LoverLost commented 1 year ago

Hi, I used your [pre-trained SeViLA localizer checkpoints] on QVHighlights to fine-tune answerer and self-refinement localizer on STAR. I ensure that the same batch_size is used for training with 2 GPUs(A100-sxm-80GB). I got the results 60.10 and 61.69 at each step, which is lower than 62.7 and 64.9 given in the paper. Is there any different processing used in training?

junwenchen commented 1 year ago

I am also facing this problem. Have you figured it out?