Open Aguin opened 3 weeks ago
hello, Maybe your model length setting is too small and the video is too long.
hello, Maybe your model length setting is too small and the video is too long.
@LDLINGLINGLING yes, downsampling can solve this, but I still think L119 is incorrect since it will stack two tensors of different lengths.
Where to set the model length ?
Where to set the model length ?
set MAX_NUM_FRAMES=40 #64 # if cuda OOM set a smaller number
when inference videos
maybe 40 is the largest, due to the max_num of tokens is 8192
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
When the length of
image_start_tokens
and the length ofimage_end_tokens
are not equal,valid_image_nums
will be the greater one, causingtorch.hstack
to fail due to tensor size mismatch. Shouldmax
bemin
? https://huggingface.co/openbmb/MiniCPM-V-2_6/blob/main/processing_minicpmv.py#L119期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
run the video example with
video_path="./assets/demo_video.mp4"
https://github.com/OpenBMB/MiniCPM-V?tab=readme-ov-file#chat-with-video运行环境 | Environment
备注 | Anything else?
No response