Open fistyee opened 1 week ago
Thanks, could you describe what the duration distribution of clips extracted from movies is, and can you provide more query prompts for LongVA as a reference?
could you describe what the duration distribution of clips extracted from movies i
We do not use clips from the movie. We load the entire movie as the haystack video and sample frames at 1 fps, as stated in the paper and also reflected in our code:
can you provide more query prompts for LongVA as a reference?
I am not sure what you mean by "query prompts". If you are looking for needle images & questions, they are here: https://huggingface.co/datasets/lmms-lab/v_niah_needles. If you are looking for the prompt template: https://github.com/EvolvingLMMs-Lab/LongVA/blob/efc27fdcc9cdc411dee8af296aa1a34ebb29d445/vision_niah/eval_vision_niah.py#L48-L51
We cannot provide the haystack video ourselves as we use an actual movie in our evaluation. Specifically, I use the movie "孤注一掷(no more bet)" as the haystack :)
The rest instructions are written the README:
https://github.com/EvolvingLMMs-Lab/LongVA?tab=readme-ov-file#v-niah-evaluation
Let me know if you encounter any problems.