Hi~ thanks for your nice work~
I want to caption a self-captured video, could you please give some detailed instructions on how to adapt the pretrained model provide in the code to finish this task? For example, the feature extraction method, feature data format, and how to visualize the final result? Thanks a lot!
Hi~ thanks for your nice work~ I want to caption a self-captured video, could you please give some detailed instructions on how to adapt the pretrained model provide in the code to finish this task? For example, the feature extraction method, feature data format, and how to visualize the final result? Thanks a lot!