Fix the potential risk by PR #117

In the previous commit, I have commented the L408 (stopping_criteria for generation) in https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/lmms_eval/models/llava_vid.py#L408C22-L408C39 to fit this model with transformers >4.40.2. (The stopping_criteria still works with transformers==4.40.0)

Though this does not affect MCQ accuracy, no stopping criteria may potentially lead to risk on open-ended answers. Henceforth, a more responsible way is to remove this comment here.

I also include the lmms_eval/api/task.py to allow extracting videos from tars (which will make automatic evaluation of LongVideoBench really works from a brand new machine).

EvolvingLMMs-Lab / lmms-eval

Fix the potential risk by PR #117 #118