Closed yt2639 closed 11 months ago
Hey, @yt2639 did you find an alternative model?
Hey, @yt2639 did you find an alternative model?
No, I downloaded the pretrained weights and finetuned it myself. It seems to get similar results on 8 A5000 gpus for msrvtt dataset. But still, if authors can release the finetuned models, that will be great and very much appreciated.
@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this?
Here is the performance when I finetune
07/01/2023 02:10:55 - INFO - __main__ - ====-evaluation--cap%tva%tv--msrvtt_cap_tva=====step 10089--==========
07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.76, 'Bleu_2': 67.74, 'Bleu_3': 55.93, 'Bleu_4': 44.78, 'METEOR': 28.8, 'ROUGE_L': 62.59, 'CIDEr': 55.79}
07/01/2023 02:10:55 - INFO - __main__ - ======evaluation--cap%tva%tv--msrvtt_cap_tva====history best step: 4035==
07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 79.48, 'Bleu_2': 67.83, 'Bleu_3': 55.77, 'Bleu_4': 44.78, 'METEOR': 29.13, 'ROUGE_L': 62.86, 'CIDEr': 56.34}
07/01/2023 02:10:55 - INFO - __main__ - ====-evaluation--cap%tva%tv--msrvtt_cap_tv=====step 10089--==========
07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
07/01/2023 02:10:55 - INFO - __main__ - ======evaluation--cap%tva%tv--msrvtt_cap_tv====history best step: 10089==
07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this?
Here is the performance when I finetune
07/01/2023 02:10:55 - INFO - __main__ - ====-evaluation--cap%tva%tv--msrvtt_cap_tva=====step 10089--========== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.76, 'Bleu_2': 67.74, 'Bleu_3': 55.93, 'Bleu_4': 44.78, 'METEOR': 28.8, 'ROUGE_L': 62.59, 'CIDEr': 55.79} 07/01/2023 02:10:55 - INFO - __main__ - ======evaluation--cap%tva%tv--msrvtt_cap_tva====history best step: 4035== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 79.48, 'Bleu_2': 67.83, 'Bleu_3': 55.77, 'Bleu_4': 44.78, 'METEOR': 29.13, 'ROUGE_L': 62.86, 'CIDEr': 56.34} 07/01/2023 02:10:55 - INFO - __main__ - ====-evaluation--cap%tva%tv--msrvtt_cap_tv=====step 10089--========== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97} 07/01/2023 02:10:55 - INFO - __main__ - ======evaluation--cap%tva%tv--msrvtt_cap_tv====history best step: 10089== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
Hi @thechargedneutron , I didn't get the AssertionErrors. I only finetuned the video-text retrieval task on msrvtt dataset and this is the log I get:
20:17:18 - INFO - __main__ - ====-evaluation--ret%tva%tv--msrvtt_ret_t_v=====step 9789--==========
20:17:18 - INFO - __main__ - {'video_recall': '50.6/77.6/85.9', 'video_ravg': 71.4, 'video_medianR': 1.0, 'video_meanR': 12.203125}
20:17:18 - INFO - __main__ - ======evaluation--ret%tva%tv--msrvtt_ret_t_v====history best step: 4894==
20:17:18 - INFO - __main__ - {'video_recall': '53.0/77.7/86.1', 'video_ravg': 72.3, 'video_medianR': 1.0, 'video_meanR': 11.34375}
20:17:18 - INFO - __main__ - ====-evaluation--ret%tva%tv--msrvtt_ret_t_va=====step 9789--==========
20:17:18 - INFO - __main__ - {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875}
20:17:18 - INFO - __main__ - ======evaluation--ret%tva%tv--msrvtt_ret_t_va====history best step: 9789==
20:17:18 - INFO - __main__ - {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875}
20:19:19 - INFO - __main__ - {'loss_ret%tva%tv--msrvtt_ret/contra_loss': 0.2164306640625, 'loss_ret%tva%tv--msrvtt_ret/total_loss': 0.2164306640625}
So I am not sure if they reported t_va
number or t_v
in Table. 3 in the paper. If it was t_v
, then I only got 50.6 (or 53.0) for it which is lower than 54.4 as reported in Table. 3. But the t_va
number is close - 54.5. So I guess maybe they reported t_va
number in Table. 3?
A little bit weird thing is that I can actually put in train_batch_size = 64
in my 8 x 24GB A5000 GPUs. Not sure if this is normal as the authors reported using A100 GPUs so at first I thought I cannot use train_batch_size = 64
in my A5000 GPUs.
Thanks for your comments. You did not get assertionerrors since those are captioning metrics and you tried retrieval. +1 to the request to release finetuned models for the captioning tasks.
@yt2639 Hi, what's the performance after finetuning? I am getting significantly lower scores after finetuning on 8 32GB V100 GPUs. I also faced some AssertionErrors as mentioned in #15 and I had to comment out all the assert checks in all the metrics files (BLEU, ROUGE, METEOR etc.). Did you also have to do this? Here is the performance when I finetune
07/01/2023 02:10:55 - INFO - __main__ - ====-evaluation--cap%tva%tv--msrvtt_cap_tva=====step 10089--========== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.76, 'Bleu_2': 67.74, 'Bleu_3': 55.93, 'Bleu_4': 44.78, 'METEOR': 28.8, 'ROUGE_L': 62.59, 'CIDEr': 55.79} 07/01/2023 02:10:55 - INFO - __main__ - ======evaluation--cap%tva%tv--msrvtt_cap_tva====history best step: 4035== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 79.48, 'Bleu_2': 67.83, 'Bleu_3': 55.77, 'Bleu_4': 44.78, 'METEOR': 29.13, 'ROUGE_L': 62.86, 'CIDEr': 56.34} 07/01/2023 02:10:55 - INFO - __main__ - ====-evaluation--cap%tva%tv--msrvtt_cap_tv=====step 10089--========== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97} 07/01/2023 02:10:55 - INFO - __main__ - ======evaluation--cap%tva%tv--msrvtt_cap_tv====history best step: 10089== 07/01/2023 02:10:55 - INFO - __main__ - {'Bleu_1': 78.14, 'Bleu_2': 66.95, 'Bleu_3': 55.26, 'Bleu_4': 44.18, 'METEOR': 28.56, 'ROUGE_L': 62.32, 'CIDEr': 55.97}
Hi @thechargedneutron , I didn't get the AssertionErrors. I only finetuned the video-text retrieval task on msrvtt dataset and this is the log I get:
20:17:18 - INFO - __main__ - ====-evaluation--ret%tva%tv--msrvtt_ret_t_v=====step 9789--========== 20:17:18 - INFO - __main__ - {'video_recall': '50.6/77.6/85.9', 'video_ravg': 71.4, 'video_medianR': 1.0, 'video_meanR': 12.203125} 20:17:18 - INFO - __main__ - ======evaluation--ret%tva%tv--msrvtt_ret_t_v====history best step: 4894== 20:17:18 - INFO - __main__ - {'video_recall': '53.0/77.7/86.1', 'video_ravg': 72.3, 'video_medianR': 1.0, 'video_meanR': 11.34375} 20:17:18 - INFO - __main__ - ====-evaluation--ret%tva%tv--msrvtt_ret_t_va=====step 9789--========== 20:17:18 - INFO - __main__ - {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875} 20:17:18 - INFO - __main__ - ======evaluation--ret%tva%tv--msrvtt_ret_t_va====history best step: 9789== 20:17:18 - INFO - __main__ - {'video_recall': '54.5/80.8/88.0', 'video_ravg': 74.4, 'video_medianR': 1.0, 'video_meanR': 11.1171875} 20:19:19 - INFO - __main__ - {'loss_ret%tva%tv--msrvtt_ret/contra_loss': 0.2164306640625, 'loss_ret%tva%tv--msrvtt_ret/total_loss': 0.2164306640625}
So I am not sure if they reported
t_va
number ort_v
in Table. 3 in the paper. If it wast_v
, then I only got 50.6 (or 53.0) for it which is lower than 54.4 as reported in Table. 3. But thet_va
number is close - 54.5. So I guess maybe they reportedt_va
number in Table. 3?A little bit weird thing is that I can actually put in
train_batch_size = 64
in my 8 x 24GB A5000 GPUs. Not sure if this is normal as the authors reported using A100 GPUs so at first I thought I cannot usetrain_batch_size = 64
in my A5000 GPUs.
T-VA metric is reported.
@thechargedneutron @yt2639 @kenhuangsy Hey guys, the finetuned checkpoints of VALOR-base/large on MSRVTT caption/retrieval datasets have been released now, Thanks for your attentions.
Could you please share the plan to release other versions of fine-tuned models? I am eagerly anticipating the one trained with ActivityNet-QA.
Hi authors,
Amazing paper and thanks for providing this nice code base. I have a question regarding the finetuned model, specifically for video-text retrieval task. Do you have plans to release those models? I do understand that we can use the pretrained VALOR as provided in the main page README (shown below)
Download Checkpoints
to finetune the pretrained models for down-stream tasks. But in the paper, the implementation details suggest using 8 A100 GPUs which I don't have. So I probably cannot reproduce the good results reported in the paper. Therefore, I am wondering if you plan to release the finetuned models for video-text retrieval task?
Thanks! Shane