Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
996 stars 66 forks source link

Repetitive response problem with VLM #27

Closed Muhammad4hmed closed 7 months ago

Muhammad4hmed commented 7 months ago

Hi, first of all thank you so much for this great contribution!

I've been testing your model on driving videos and I noticed a strange thing that some times it keeps repeating the output like in given image: Screenshot from 2024-02-20 23-11-00 I think this might be a parameters issue, so the hyperparameters I'm using are:

{
      "model_path": 'mtgv/MobileVLM_V2-3B,
      "conv_mode": "v1",
      "temperature": 0.0, 
      "top_p": None,
      "num_beams": 1,
      "max_new_tokens": 512,
      "load_8bit": False,
      "load_4bit": False,
  }

can you please look into it and what might be the issue, also if possible, please do share optimal hyperparameters for mtgv/MobileVLM_V2-3B,.

Thanks

YangYang-DLUT commented 7 months ago

Could you provide the picture and the prompt? We try to reproduce this problem.

YangYang-DLUT commented 7 months ago

Does this problem happen while directly using our inference script?

Muhammad4hmed commented 7 months ago

Hi @YangYang-DLUT

Could you provide the picture and the prompt? We try to reproduce this problem.

I've already provided the response image in the issue, the text returned from the model is repetitive like this:

Prompt: Is the driver driving safely or distracted?
Response:
1. The driver is not driving attentively.
2. The driver is not driving attentively.
3. The driver is not driving attentively.
4. The driver is not driving attentively.
5. The driver is not driving attentively.
....

Yes I'm using the same inference code provided with the hyper-parameters I provided in the issue.

Does this problem happen while directly using our inference script?

Thanks

YangYang-DLUT commented 7 months ago

Could you attach the exact image file to the comment? 😸

Muhammad4hmed commented 7 months ago

sure, here it is Screenshot from 2024-02-20 23-11-00

YangYang-DLUT commented 7 months ago

image I am looking forward to get the original file of the input image on the left of the screen shot. 😸 Could you attach it to the comment?

Muhammad4hmed commented 7 months ago

Ohh, sorry. Here it is: Screenshot from 2024-02-23 14-34-59

Muhammad4hmed commented 7 months ago

By the way, to work with videos, I'm using image grid. so this is the input I used:

The exact prompt:

You are a driving assistant. You will help me during driving by providing details and answering my questions. Remember to keep the response accurate and do not repeat.

This image shows 4 consecutive video frames captured from a dashcam at 4-second intervals. 
Instead of general driving instructions, analyze the image carefully and provide specific, detailed bullet points for each action you would take as the driver in this situation. Remember, the red lines represent my vehicle's current lane and they are only added for your assistance, I can't see them.

USER: Is the driver driving attentively or distracted?
ASSISTANT: 

grid

Muhammad4hmed commented 7 months ago

Problem solved, with a few debugging, I found that I was missing

<image>

tag in the prompt which was causing this.

weizhou1991 commented 6 months ago

Problem solved, with a few debugging, I found that I was missing

<image>

tag in the prompt which was causing this.

Hi Sir, What's ur meaning by miss tag ?

Muhammad4hmed commented 6 months ago

Hi,

Normally when directly running MobileVLM, you won't require this but if you are building something which works directly with the model, you will need to make sure to follow this format:


Prompt = """
USER: <image>
Question here

ASSISTANT: """

Thanks

On Thu, Mar 14, 2024 at 7:19 AM weizhou1991 ***@***.***>
wrote:

> Problem solved, with a few debugging, I found that I was missing
>
> <image>
>
> tag in the prompt which was causing this.
>
> Hi Sir,
> What's ur meaning by miss tag ?
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/Meituan-AutoML/MobileVLM/issues/27#issuecomment-1996270330>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AK7ZV4B7HKNIMZHXVUWSSPTYYECLBAVCNFSM6AAAAABDRUVBDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGI3TAMZTGA>
> .
> You are receiving this because you modified the open/close state.Message
> ID: ***@***.***>
>