Though this does not affect MCQ accuracy, no stopping criteria may potentially lead to risk on open-ended answers. Henceforth, a more responsible way is to remove this comment here.
I also include the lmms_eval/api/task.py to allow extracting videos from tars (which will make automatic evaluation of LongVideoBench really works from a brand new machine).
In the previous commit, I have commented the L408 (stopping_criteria for generation) in https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/models/llava_vid.py#L408C22-L408C39 to fit this model with transformers >4.40.2. (The
stopping_criteria
still works with transformers==4.40.0)Though this does not affect MCQ accuracy, no stopping criteria may potentially lead to risk on open-ended answers. Henceforth, a more responsible way is to remove this comment here.
I also include the lmms_eval/api/task.py to allow extracting videos from tars (which will make automatic evaluation of LongVideoBench really works from a brand new machine).