I'm currently able to use run_vision_chat.sh with a limited number of video frames being passed in for a single text query. The text result is output from the model and then the process ends. However, the paper shows examples of a continuous dialogue about a video and I was wondering if it's possible to set this up.
I'm currently able to use run_vision_chat.sh with a limited number of video frames being passed in for a single text query. The text result is output from the model and then the process ends. However, the paper shows examples of a continuous dialogue about a video and I was wondering if it's possible to set this up.