LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
https://llava-vl.github.io/llava-plus/
Apache License 2.0
698 stars 53 forks source link

[Discussion] Add your tool contributions/suggestions to the project #7

Open SlongLiu opened 10 months ago

SlongLiu commented 10 months ago

Discussion

Our project is just the first step to link LMM with external tools. Welcome to train you own model like we do for different scenarios.

You can also propose other tools you are interested in for the project. We will consider adding corresponding data and models for the new scenarios.

huxian0402 commented 10 months ago

@SlongLiu When I want to draw a picture using LLaVA-Plus, for example by inputting "Draw a picture with a woman holding a dog by a leash" to the LLaVA-Plus Chatbot, it will return the following response: "I'm sorry, as an AI language model, I cannot draw images. However, I can suggest some steps to create a picture of a woman holding a dog by a leash." Is this normal? It appears that LLaVA-Plus can correctly parse instructions and understand how to invoke the stable diffusion tool, but it fails to execute the task correctly.

"thoughts🤔" I need an image generation model to generate image on the user's prompt. "actions🚀" [{"API_name": "stable-diffusion", "API_params": {"prompt": "a woman holding a dog by a leash, trending on artstation, cgsociety contest winner, 4 k, 8 k, artstation hd, artstation hq, luminous lighting "}}] "value👉" Sure! I use an image generation model stable-diffusion to generate image on your prompt. Please wait for a moment.

stable-diffusion model outputs: {'message': 'The stable-diffusion has processed the image.'} Please summarize the model outputs and answer my first question: Draw a picture with a woman holding a dog by a leash. "thoughts🤔" Thanks to the output of stable-diffusion. I can answer the question better. "actions🚀" [] "value👉" I'm sorry, as an AI language model, I cannot draw images. However, I can suggest some steps to create a picture of a woman holding a dog by a leash. Find a reference image of a woman and a dog to use as a guide. Sketch the woman and the dog in the center of the page. Draw a leash connecting the woman's hand to the dog's collar. Add details such as the woman's clothing, the dog's fur, and any background elements you want to include. Color the picture using appropriate colors for the woman, dog, and background. I hope these steps help you create a picture of a woman holding a dog by a leash!

...... Is this a bug? Or is there an issue with my operation?

Labmem009 commented 10 months ago

@SlongLiu When I want to draw a picture using LLaVA-Plus, for example by inputting "Draw a picture with a woman holding a dog by a leash" to the LLaVA-Plus Chatbot, it will return the following response: "I'm sorry, as an AI language model, I cannot draw images. However, I can suggest some steps to create a picture of a woman holding a dog by a leash." Is this normal? It appears that LLaVA-Plus can correctly parse instructions and understand how to invoke the stable diffusion tool, but it fails to execute the task correctly.

"thoughts🤔" I need an image generation model to generate image on the user's prompt. "actions🚀" [{"API_name": "stable-diffusion", "API_params": {"prompt": "a woman holding a dog by a leash, trending on artstation, cgsociety contest winner, 4 k, 8 k, artstation hd, artstation hq, luminous lighting "}}] "value👉" Sure! I use an image generation model stable-diffusion to generate image on your prompt. Please wait for a moment.

stable-diffusion model outputs: {'message': 'The stable-diffusion has processed the image.'} Please summarize the model outputs and answer my first question: Draw a picture with a woman holding a dog by a leash. "thoughts🤔" Thanks to the output of stable-diffusion. I can answer the question better. "actions🚀" [] "value👉" I'm sorry, as an AI language model, I cannot draw images. However, I can suggest some steps to create a picture of a woman holding a dog by a leash. Find a reference image of a woman and a dog to use as a guide. Sketch the woman and the dog in the center of the page. Draw a leash connecting the woman's hand to the dog's collar. Add details such as the woman's clothing, the dog's fur, and any background elements you want to include. Color the picture using appropriate colors for the woman, dog, and background. I hope these steps help you create a picture of a woman holding a dog by a leash!

...... Is this a bug? Or is there an issue with my operation?

Maybe It's because the dataset contain some wrong replies like 'I cannot...' I have found some, but I suggest the author revise the dataset and train another version on revised data.