Open pangzss opened 3 years ago
Hi, could you please guide me on how to summarize my own video?
Hi, could you please guide me on how to summarize my own video?
you have to extract features first for frames of the video and then based on trained model you can predict probability to be in a summary. For object features refer https://github.com/VideoAnalysis/EDUVSUM/tree/master/src
Motion feature code I will upload after refactoring.
I fully agree with @pangzss. If my calculations are right, the used formula/command
y = torch.matmul(V.transpose(1,0), weights).transpose(1,0)
would be correct, only if the weights
array was symmetric, but this isn't the case.
Oddly enough, the produced results doesn't change much when the corrected formula/command is used.
Hi, I noticed that the last step of the self-attention calculation doesn't seem so right:
So here the softmax probability is calculated along the dim -1, which is the column direction. But then the weighted sum is taken along the row direction according to this line
I think we should do something like this
How do you think? I hope I'm the one to be corrected.