Dear authors:
It's very promising to witness that the stronger Mistral-7b llm models enhance the capability of video understanding. We would eager to see more potentials performed by replacing the llm with more strong models such as llama3, Yi-34b, InternLM. Specificlly, please try to evaluate some llm models such as 34b, 70b and let the community know whether it helps. Thank for such a great project.
Dear authors: It's very promising to witness that the stronger Mistral-7b llm models enhance the capability of video understanding. We would eager to see more potentials performed by replacing the llm with more strong models such as llama3, Yi-34b, InternLM. Specificlly, please try to evaluate some llm models such as 34b, 70b and let the community know whether it helps. Thank for such a great project.