OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.86k stars 547 forks source link

Multiple frames from video #244

Closed Tsardoz closed 2 weeks ago

Tsardoz commented 4 weeks ago

Does it work with multiple frames? I tried reading sequential frames froma folder, converting to base64 and appending but I get an error when using chat_model.chat(inputs). Is this supported? test_video.txt

Marlod390 commented 4 weeks ago

I have the same issue. I tried to feed the model multiple images, and the answer I got was "image encoder error". I look at the code of chat.py and found that the chat method in the MiniCPMV class only accepts a single image. I am also curious whether the model has the ability to read multiple images at the same time for conversation like GPT4.

Cuiunbo commented 3 weeks ago

hi, this is a very good try. it is capable of inputting multiple images. But of course, it wasn't trained on video scenarios, which leads to the fact that he may not be very good. You can have a try. please refer to this link https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/2