sorry for this stupid question, but i do not understand the usage of demo_narrator. I run the code, and it get 4 frames from a video, and get the 4 image features, and finally output 10 sentence. why 4 frame and 10 sentence? and what is the meaning of these 10 sentences?? I see them and they are different, are these 10 sentences the summary of 4 images?
sorry for this stupid question, but i do not understand the usage of demo_narrator. I run the code, and it get 4 frames from a video, and get the 4 image features, and finally output 10 sentence. why 4 frame and 10 sentence? and what is the meaning of these 10 sentences?? I see them and they are different, are these 10 sentences the summary of 4 images?