Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
https://vision-cair.github.io/Goldfish_website/
BSD 3-Clause "New" or "Revised" License
540 stars 59 forks source link

Poor performance when using videos from other datasets #33

Open JSHZT opened 2 months ago

JSHZT commented 2 months ago

I used the video from vgg-sound, I used the default question, but I found that the answer from minigpt4-video has nothing to do with the video.

KerolosAtef commented 1 month ago

Thanks for your feedback I discovered a bug related to hallucinations in MiniGPT4-video yesterday. It seems to be connected to the PEFT library. I was initially using PEFT 0.2.0, but after upgrading, the function prepare_model_for_int8_training was deprecated. When I switched to prepare_model_for_kbit_training, a significant increase in hallucinations occurred. Keep this in mind to ensure accurate performance. It solved in the current version

JSHZT commented 1 month ago

I use question:"Please describe the content of the video only in the following format: 'This video describes [video content], where [subject] appears doing [actions] in [setting/scenery].' Do not provide any additional information or explanations." but the result still cannot get the correct video information. The video I input is a video from the vgg_sound dataset, which is 10 seconds long. Are there any other good usage suggestions?