Open Wenju-Huang opened 6 months ago
README.md
As far as I can see, the README only describes the process of fine-tuning. Would it be possible to share the weights for the models you have already fine-tuned? In particular, I am interested in the model weights for the VQA.
I've run the retrieval finetune on MSR-VTT and thrown it up on huggingface https://huggingface.co/delusionallogic/vast_finetune_msrvtt_retrieval
Trained on 4xGTX600Ada, using 177.7 GB of video memory and 100GB of system memory. 64GB of storage space. For 13.5 hours at around 3.8 dollars an hour.
04/28/2024 06:58:49 - INFO - __main__ - ==== evaluation--ret%tvas--msrvtt_ret_ret_itc_tvas========
04/28/2024 06:58:49 - INFO - __main__ - {'video_r1': 52.7, 'video_recall': '52.7/78.1/86.9', 'video_ravg': 72.6}
04/28/2024 06:58:49 - INFO - __main__ - ==== evaluation--ret%tvas--msrvtt_ret_ret_itm_tvas========
04/28/2024 06:58:49 - INFO - __main__ - {'video_r1': 63.2, 'video_recall': '63.2/83.3/89.3', 'video_ravg': 78.6}
@DelusionalLogic Thanks for sharing! When tested on the checkpoint shared by you, I got results 3-4% less than what you have mentioned. Do you know why it is less?
README.md