Hi, I used your [pre-trained SeViLA localizer checkpoints] on QVHighlights to fine-tune answerer and self-refinement localizer on STAR. I ensure that the same batch_size is used for training with 2 GPUs(A100-sxm-80GB). I got the results 60.10 and 61.69 at each step, which is lower than 62.7 and 64.9 given in the paper. Is there any different processing used in training?
Hi, I used your [pre-trained SeViLA localizer checkpoints] on QVHighlights to fine-tune answerer and self-refinement localizer on STAR. I ensure that the same batch_size is used for training with 2 GPUs(A100-sxm-80GB). I got the results 60.10 and 61.69 at each step, which is lower than 62.7 and 64.9 given in the paper. Is there any different processing used in training?