Closed Yxxxb closed 1 month ago
@Yxxxb Thanks for your comments 😊.
Our paper currently conducts basic experiments under the Video-LLaVA framework. Perhaps we can see whether the more frames there are, the better the pruning effect.
By the way, your Voco-LLaMA is great, and we have cited it in our paper.
Congrats and thanks!
It's Intuitive! Have you try it in video? As more video frames are added, the stronger the pruning, the more interesting the text guidance will be.