Closed eternalding closed 8 months ago
Hi, may I see what prompt caused this problem? Also, we added new descriptions for how to set up inference examples. See here. 😊
I'm using the example from the README.md:
prompt = "What are the things I should be cautious about when I visit here?" image_file = "https://llava-vl.github.io/static/images/view.jpg"
Here, we specified what prompt mode you should use to set up inference. For the 1.5B model, use v1
mode.
Understood. Thanks for the reply, but this also limits the possibility to change custom system prompt. Will it be better to update KeywordsStoppingCriteria so 1.5B could support other conv_mode? If you suggest not to change it, then I'll close this issue. Thanks.
From my perspective, keyword_id should always be tokenized into one input id. TinyLlama's example failed because its tokenizer tokenized |<endoftext>|
into more than one ids, which in my case should not happen. If you want to change your system prompt, you can add more conv_mode
in here, following conv_vicuna_v1
.
Got it. Thanks. I'll close the issue.
Greetings. When inferencing w/ TinyLLaVA-1.5B, a runtime error occurs with following response: "RuntimeError: The size of tensor a (2) must match the size of tensor b (7) at non-singleton dimension 0"
I think the root cause of this is from the implementation of in KeywordsStoppingCriteria: https://github.com/DLCV-BUAA/TinyLLaVABench/blob/main/tinyllava/mm_utils.py#L240
When matching output_ids and keyword_id, if current output_id is shorter than keyword_id, using
if (output_ids[0, -keyword_id.shape[0]:] == keyword_id).all():
will cause output_ids[0, -keyword_id.shape[0]:] and keyword_id to have unequal size.A way to fix this is:
if (output_ids.shape[1] >= keyword_id.shape[0]) and (output_ids[0, -keyword_id.shape[0]:] == keyword_id).all():
If this solution looks good to you, is it OK for me to submit an MR for this update?
Best, YC