[Bug]: TypeError when adding a .jpg file to the chat

Sjlver commented 2 months ago

What happened?

I encounter a problem when using litellm through aider.chat... I've initially described this here: https://github.com/paul-gauthier/aider/issues/1088, but it is probably more of a litellm issue than an aider issue.

I get the following error message after having added a .jpg image to the chat:

Traceback (most recent call last):
  File "[redacted]/.local/lib/python3.10/site-packages/aider/models.py", line 519, in token_count
    return litellm.token_counter(model=self.name, messages=messages)
  File "[redacted]/.local/lib/python3.10/site-packages/litellm/utils.py", line 1994, in token_counter
    num_tokens += calculage_img_tokens(
  File "[redacted]/.local/lib/python3.10/site-packages/litellm/utils.py", line 1841, in calculage_img_tokens
    resized_width, resized_height = resize_image_high_res(
  File "[redacted]/.local/lib/python3.10/site-packages/litellm/utils.py", line 1758, in resize_image_high_res
    if width <= 768 and height <= 768:
TypeError: '<=' not supported between instances of 'NoneType' and 'int'

My suspicion is that litellm couldn't handle my perfectly valid jpg file. Here's the code that computes width and height:

def get_image_dimensions(data):
    img_data = None

    # Check if data is a URL by trying to parse it
    try:
        response = requests.get(data)
        response.raise_for_status()  # Check if the request was successful
        img_data = response.content
    except Exception:
        # Data is not a URL, handle as base64
        header, encoded = data.split(",", 1)
        img_data = base64.b64decode(encoded)

    # Try to determine dimensions from headers
    # This is a very simplistic check, primarily works with PNG and non-progressive JPEG
    if img_data[:8] == b"\x89PNG\r\n\x1a\n":
        # PNG Image; width and height are 4 bytes each and start at offset 16
        width, height = struct.unpack(">ii", img_data[16:24])
        return width, height
    elif img_data[:2] == b"\xff\xd8":
        # JPEG Image; for dimensions, SOF0 block (0xC0) gives dimensions at offset 3 for length, and then 5 and 7 for height and width
        # This will NOT find dimensions for all JPEGs (e.g., progressive JPEGs)
        # Find SOF0 marker (0xFF followed by 0xC0)
        sof = re.search(b"\xff\xc0....", img_data)
        if sof:
            # Parse SOF0 block to find dimensions
            height, width = struct.unpack(">HH", sof.group()[5:9])
            return width, height
        else:
            return None, None
    else:
        # Unsupported format
        return None, None

It is possible that my JPG file does not have the corresponding marker? I appreciate it that litellm tries to reduce its dependencies... but it might make sense to use an image library for this :)

You can find the jpg file at https://blog.purpureus.net/assets/blog/personal_key_rotation/simplified-asset-graph.jpg

Relevant log output

No response

Twitter / LinkedIn details

@sjlver

paul-gauthier commented 2 months ago

Here is some code which tries to reproduce the issue:

import base64
import litellm

image_file = Path("tmp.jpg")
encoded_string = base64.b64encode(open(image_file, 'rb').read()).decode("utf-8")

mime_type = "image/jpeg"
image_url = f"data:{mime_type};base64,{encoded_string}"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": image_url,
                    "detail": "high"
                }
            }
        ]
    },
]

litellm.token_counter(model="gpt-4o", messages=messages)

But it throws a different error:

Traceback (most recent call last):
  File "/Users/gauthier/Projects/aider/./tmp.py", line 53, in <module>
    litellm.token_counter(model="gpt-4o", messages=messages)
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1994, in token_counter
    num_tokens += calculage_img_tokens(
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1840, in calculage_img_tokens
    width, height = get_image_dimensions(data=data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gauthier/Projects/aider/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1823, in get_image_dimensions
    height, width = struct.unpack(">HH", sof.group()[5:9])
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: unpack requires a buffer of 4 bytes

Here is a the small tmp.jpg file: tmp

Sjlver commented 2 months ago

Thanks for the rapid fix! 💙

BerriAI / litellm