Open VedantR3907 opened 1 month ago
@VedantR3907, the issue is due to the way litellm validates if a given model has vision capability or not. So litellm maintains a list of models with their properties and capabilities in a static json and the llama 3.2 models (including the vision models) are not added in it. https://github.com/BerriAI/litellm/blob/fb523b79e9fdd7ce2d3a33f6c57a3679c7249e35/litellm/utils.py#L4974 https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
For now you try to uninstall zerox and install from this fork (#40):
pip install git+https://github.com/pradhyumna85/zerox.git@formatting-control
and pass validate_vision_capability=False
in the zerox function and see if that solves.
I tried to pass the parameter, We don't have the parameter passed to the Zerox function:
So, There was the same problem with llava models in ollama, So I tried a workaround to see if it works and it foes for ollama llava model as below: pyzerox\models\modellitellm.py
But the same I tried with groq, Which gave me a error from groq i guess:
@VedantR3907, there was some issue as all the kwargs were passed to litellm, fixed that, remove and reinstall pyzerox using the same pip command shared eariler, however this time there is a new error: it seems like groq's llama 3.2 vision models don't support system message with messages with images.
Looks like litellm hasn't added support for vision models from groq.
@VedantR3907, in the latest versions of litellm 1.50.1 (we are using a lower version in pyzerox), I can get image prompting to work with llama 3.2 vision but the current implementation in pyzerox backend uses system prompt for instructions which groq backend doesn't support along with image input. Feel free to fork the repo to adapt the model class in pyzerox\models\modellitellm.py to remove system prompt and providing the instructions in user prompt may be to see if that works, if that goes well then you can raise a PR for the same, we just need to make sure that doesn't break existing models.
@pradhyumna85, I made changes in the modellitellm.py, It works now, But Currently still I am using the same system prompt passing as a text for groq models which is getting used for all the other models. We can change that cause for bigger models it is working perfectly (for Groq). But smaller models is not perfect but good.
I only changed the _prepare_messages function from the modellitellm.py
`async def _prepare_messages( self, image_path: str, maintain_format: bool, prior_page: str, ) -> List[Dict[str, Any]]: """Prepares the messages to send to the LiteLLM Completion API. :param image_path: Path to the image file. :type image_path: str :param maintain_format: Whether to maintain the format from the previous page. :type maintain_format: bool :param prior_page: The markdown content of the previous page. :type prior_page: str """ messages: List[Dict[str, Any]] = []
# Check if the model belongs to Groq family (by checking if model starts with 'groq/')
if self.model.startswith('groq/'):
# Prepare the user message that includes system instructions and image
user_content = []
# Add system prompt content as text in the user message
user_content.append(
{
"type": "text",
"text": f"{self._system_prompt}",
}
)
# If maintain_format is true, add prior page formatting
if maintain_format and prior_page:
user_content.append(
{
"type": "text",
"text": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
}
)
# Add image as part of the user message
base64_image = await encode_image_to_base64(image_path)
user_content.append(
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
}
)
# Append the user message
messages.append(
{
"role": "user",
"content": user_content,
}
)
else:
# Default behavior for non-Groq models
# Add system prompt as system message
messages.append(
{
"role": "system",
"content": self._system_prompt,
}
)
# If maintain_format is true, add prior page formatting as a system message
if maintain_format and prior_page:
messages.append(
{
"role": "system",
"content": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""',
}
)
# Add image as part of the user message
base64_image = await encode_image_to_base64(image_path)
messages.append(
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
],
}
)
return messages`
@pradhyumna85, I made changes in the modellitellm.py, It works now, But Currently still I am using the same system prompt passing as a text for groq models which is getting used for all the other models. We can change that cause for bigger models it is working perfectly (for Groq). But smaller models is not perfect but good.
I only changed the _prepare_messages function from the modellitellm.py
`async def _prepare_messages( self, image_path: str, maintain_format: bool, prior_page: str, ) -> List[Dict[str, Any]]: """Prepares the messages to send to the LiteLLM Completion API. :param image_path: Path to the image file. :type image_path: str :param maintain_format: Whether to maintain the format from the previous page. :type maintain_format: bool :param prior_page: The markdown content of the previous page. :type prior_page: str """ messages: List[Dict[str, Any]] = []
# Check if the model belongs to Groq family (by checking if model starts with 'groq/') if self.model.startswith('groq/'): # Prepare the user message that includes system instructions and image user_content = [] # Add system prompt content as text in the user message user_content.append( { "type": "text", "text": f"{self._system_prompt}", } ) # If maintain_format is true, add prior page formatting if maintain_format and prior_page: user_content.append( { "type": "text", "text": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""', } ) # Add image as part of the user message base64_image = await encode_image_to_base64(image_path) user_content.append( { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}, } ) # Append the user message messages.append( { "role": "user", "content": user_content, } ) else: # Default behavior for non-Groq models # Add system prompt as system message messages.append( { "role": "system", "content": self._system_prompt, } ) # If maintain_format is true, add prior page formatting as a system message if maintain_format and prior_page: messages.append( { "role": "system", "content": f'Markdown must maintain consistent formatting with the following page: \n\n """{prior_page}"""', } ) # Add image as part of the user message base64_image = await encode_image_to_base64(image_path) messages.append( { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": f"data:image/png;base64,{base64_image}"}, }, ], } ) return messages`
can you share what all changes you made @VedantR3907
@MANOJ21K See the code I shared above I passed the system prompt for the GROQ models as user's message, withing in the user_content list. copy paste the code above you will be able to use the system prompt written by @pradhyumna85
@pradhyumna85 can you share the modellitellm.py and py-zerox version
I tried using Vision models like llama-3.2-90b-vision-preview, llama-3.2-11b-vision-preview, llava-v1.5-7b-4096-preview but it shows same thing as: