X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.32k stars 176 forks source link

unfinished inference outputs #252

Closed ZHANGH83 closed 1 day ago

ZHANGH83 commented 3 days ago

Hi~ Congratulations to your wonderful work on mPLUG-Owl3! I try to infer some videos but the outputs seem to be limited to under 100 words? There should be some words in the inference results but seems to be deleted. Whats the reason for this issue, please?

Examples of inference results: ['The video quality is low, with noticeable pixelation and a lack of sharpness. The colors are washed out, and there is a significant amount of noise present. The focus is not consistent, and there is motion blur, particularly in the moving vehicles. The exposure is uneven, with some areas overexposed and others underexposed. There is also a noticeable camera distortion, particularly in the edges of the frame. The composition is chaotic, with no clear subject or focal point. Overall, the']

['The video shows a busy street with cars and trucks driving in different directions. The camera captures the movement of the vehicles and the surrounding environment. The quality of the video is not very clear, but it is still possible to make out the different colors and shapes of the vehicles. The video provides a glimpse into the hustle and bustle of a busy street, with cars and trucks moving in different directions. The video is shot from the perspective of a car, giving the viewer a sense of being in the']

ZHANGH83 commented 1 day ago

I see. It's because the default max_new_tokens is set as 100.