Questions about the provided fine-tuning model parameters

Hello author, when I tested the performance of the provided timechat_7b.pth, I found that the measured indicators were lower than the results reported in the paper. I fine-tuned Timechat according to the requirements in the paper, the measured performance was higher than the provided timechat_7b.pth result. I would like to ask if there is something wrong with my fine-tuning/testing phase? Or are there errors in the fine-tuned model parameters provided?

Here are the results I got from testing the provided fine-tuning parameters timechat_7b.pth: (Because some videos are lost, the test data is nearly 20 less, but I guess it will not have a big impact on the results)

[val] gt video nums 396; pred video nums 396 gt video nums 396; pred video nums 396 evaluate data samples: 396 gt file: paragraph video captioning Para_CIDER 2.5 Para_METEOR 6.7 dense video captioning CIDER 2.4 METEOR 0.9 Precision@0.3 26.8 Recall@0.3 26.7 Precision@0.5 8.9 Recall@0.5 9.9 Precision@0.7 2.1 Recall@0.7 2.9 Precision@0.9 0.4 Recall@0.9 0.6 Precision_Mean 9.5 Recall_Mean 10.0 F1_Score 8.7 SODA_c_2 0.9 n_preds 7.6 SODA_c_1 -100.0

The following are the results of the fine-tuned checkpoint_2.pth that I reproduced myself:

[val] gt video nums 396; pred video nums 396 gt video nums 396; pred video nums 396 evaluate data samples: 396 gt file: paragraph video captioning Para_CIDER 2.1 Para_METEOR 8.1 dense video captioning CIDER 2.8 METEOR 1.0 Precision@0.3 31.1 Recall@0.3 43.5 Precision@0.5 11.0 Recall@0.5 17.9 Precision@0.7 3.4 Recall@0.7 6.3 Precision@0.9 0.4 Recall@0.9 0.8 Precision_Mean 11.5 Recall_Mean 17.1 F1_Score 12.4 SODA_c_2 1.2 n_preds 11.0 SODA_c_1 -100.0

RenShuhuai-Andy / TimeChat

Questions about the provided fine-tuning model parameters #30