TIBHannover / MSVA

Deep learning model for supervised video summarization called Multi Source Visual Attention (MSVA)
MIT License
41 stars 17 forks source link

problem about code,the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403 #5

Open sunguoquan1005 opened 3 years ago

sunguoquan1005 commented 3 years ago

I can't reproduce the results, I run the code, and the coefficient of Spearman’s and Kendall’s in Tvsum are 0.5849 and 0.6403 respectively,which are more higher than the results.

Junaid112 commented 3 years ago

Did you take average of all k-folds or these are only for one 80-20 distribution? Average along all validation parts will reduce close to original numbers.

sunguoquan1005 commented 3 years ago

I take all ,but I get that result

mpalaourg commented 2 years ago

Hello, first thank you for your contribution in Video summarization research and for making your work open-source.

I am trying too to compute the correlation coefficient myself and I am in a weird loop. @sunguoquan1005 the reported result on the paper I'll get it if I first take the mean of the user summaries (to only have 1 user/true summary for each video). Then, compute the coefficients (ρ and τ) for each video, take the mean to compute the ρ and τ for each split and then again the mean of the splits.

Your result I'll take it if I skip the first step of taking the mean of the user summaries (and so I'll have N user/true summaries). Compute the coefficients (ρ and τ) for each true summary and then take the mean to compute ρ and τ for each video) And so on, ...

The weird thing is that the results are too good to be true! Reading the paper that introduced this evaluation protocol the authors talk about how much the F1 value is correlated with the use of knapsack. I am thinking (and I would like your opinion on that) that the coefficients must not be computed on the (binary) user summaries (produced by the knapsack) but rather on the (real) user scores! That means that this evaluation protocol is only applicable on TVSum and not SumMe, something that the original paper validate when the ρ and τ coefficients of TVSum are reported.

Sorry for the late response in the issue, but only now I found that discussion and I would love to see your opinion on that matter.

xings19 commented 2 years ago

I get that result,too. How should I modify the code in the project to achieve the results in the paper?

Junaid112 commented 2 years ago

I get that result,too. How should I modify the code in the project to achieve the results in the paper?

In this paper, evaluation is followed provided by "Video Summarization with Long Short-term Memory" & TvSum paper. There average was used among users and then we did average for k-folds. Weird thing was for SumMe max was taken among score per user and then we took average for k-folds. Consider this average accumulation I just stated and original scores from TvSum for correlation wit prediction then you will get the results. I do not agree for evaluation criteria where max is taken for SumMe but we can avoid this in next research by doing comparison of both criteria's.