DCDmllm / Cheetah

BSD 3-Clause "New" or "Revised" License
356 stars 35 forks source link

Weird Ouptut for simple image #7

Closed mridulbirla closed 1 year ago

mridulbirla commented 1 year ago

I tried testing this with the below sample image and the modified test_cheetah_llama2.py to

######## Example 0 ######
print("\nExample 0")
context = "<Img>HereForImage</Img> what does picture is about? "
raw_img_list = ['./examples/screenshot.jpg']
print("Question: ", context)
llm_message = chat.answer(raw_img_list, context)
print("Answer: ", llm_message)

screenshot

The output looks super wierd. . Is there something I am doing wrong or you also encounter same kind of issue. I using Llam2

Example 0
Question:  <Img>HereForImage</Img> what does picture is about?
Answer:   When the page loads, you will see an image of a person sitting at a desk with a laptop open in front of them. The person is wearing a blue shirt and has a blue headset on their head. There is a blue book on the desk in front of them.
WestbrookGE commented 1 year ago

Thank you for bringing this to our attention. We appreciate your effort in testing and sharing the feedback.

Upon reviewing the case you've presented, we've observed similar issues not only with Cheetah but also with other multimodal LLMs. It appears to be a more generalized problem for multimodal LLMs.

Rest assured, we recognize the importance of addressing this issue and will work towards finding a solution in upcoming updates.