Closed Muhammad4hmed closed 7 months ago
Could you provide the picture and the prompt? We try to reproduce this problem.
Does this problem happen while directly using our inference script?
Hi @YangYang-DLUT
Could you provide the picture and the prompt? We try to reproduce this problem.
I've already provided the response image in the issue, the text returned from the model is repetitive like this:
Prompt: Is the driver driving safely or distracted?
Response:
1. The driver is not driving attentively.
2. The driver is not driving attentively.
3. The driver is not driving attentively.
4. The driver is not driving attentively.
5. The driver is not driving attentively.
....
Yes I'm using the same inference code provided with the hyper-parameters I provided in the issue.
Does this problem happen while directly using our inference script?
Thanks
Could you attach the exact image file to the comment? 😸
sure, here it is
I am looking forward to get the original file of the input image on the left of the screen shot. 😸 Could you attach it to the comment?
Ohh, sorry. Here it is:
By the way, to work with videos, I'm using image grid. so this is the input I used:
The exact prompt:
You are a driving assistant. You will help me during driving by providing details and answering my questions. Remember to keep the response accurate and do not repeat.
This image shows 4 consecutive video frames captured from a dashcam at 4-second intervals.
Instead of general driving instructions, analyze the image carefully and provide specific, detailed bullet points for each action you would take as the driver in this situation. Remember, the red lines represent my vehicle's current lane and they are only added for your assistance, I can't see them.
USER: Is the driver driving attentively or distracted?
ASSISTANT:
Problem solved, with a few debugging, I found that I was missing
<image>
tag in the prompt which was causing this.
Problem solved, with a few debugging, I found that I was missing
<image>
tag in the prompt which was causing this.
Hi Sir,
What's ur meaning by miss
Hi,
Normally when directly running MobileVLM, you won't require this but if you are building something which works directly with the model, you will need to make sure to follow this format:
Prompt = """
USER: <image>
Question here
ASSISTANT: """
Thanks
On Thu, Mar 14, 2024 at 7:19 AM weizhou1991 ***@***.***>
wrote:
> Problem solved, with a few debugging, I found that I was missing
>
> <image>
>
> tag in the prompt which was causing this.
>
> Hi Sir,
> What's ur meaning by miss tag ?
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/Meituan-AutoML/MobileVLM/issues/27#issuecomment-1996270330>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AK7ZV4B7HKNIMZHXVUWSSPTYYECLBAVCNFSM6AAAAABDRUVBDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWGI3TAMZTGA>
> .
> You are receiving this because you modified the open/close state.Message
> ID: ***@***.***>
>
Hi, first of all thank you so much for this great contribution!
I've been testing your model on driving videos and I noticed a strange thing that some times it keeps repeating the output like in given image: I think this might be a parameters issue, so the hyperparameters I'm using are:
can you please look into it and what might be the issue, also if possible, please do share optimal hyperparameters for
mtgv/MobileVLM_V2-3B,
.Thanks