Open XYI-xue opened 5 months ago
Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the caption for each frame.
Hi, you can refer to the implementation details section in the main paper. For the SumMe and TVSum dataset, we adopt the pre-trained image caption model GPT-2 to generate the caption for each frame.
Thank you very much for your reply! I would also like to ask, how did you obtain the original video data in the data set? I looked at the dataset file you provided and found that it only has relevant feature values for each video. Could you please provide the link or file where I can obtain the original video data that can be played? Thanks for your help and looking forward to your reply.(☆▽☆)
Hi author, I would like to ask how you got the transcribed text corresponding to the videos in the SumMe and TVSum datasets? Was it created manually or did you use an existing model?I am very much looking forward to your answer.