Azure-Samples / openai-aca-lb

Smart load balancing for Azure OpenAI endpoints
MIT License
41 stars 17 forks source link

431 Request Header Fields Too Large when using the loadbalancer #25

Open yovelcohen opened 3 weeks ago

yovelcohen commented 3 weeks ago

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

proxy_client = AsyncAzureOpenAI(
    azure_endpoint=llm_settings.AZURE_GPT4O_PROXY_ENDPOINT,
    api_version="2024-08-01-preview",
    api_key=llm_settings.AZURE_GPT4_O_KEY
)
direct_client = AsyncAzureOpenAI(
    azure_endpoint=llm_settings.AZURE_OPENAI_SWEDEN_URL,
    api_version="2024-08-01-preview",
    api_key=llm_settings.AZURE_OPENAI_SWEDEN_KEY
)

base64_image = images[0]
msg = [
            {"type": "text", "text": prompt},
            *[{"type": "image_url", "image_url": {"url": f"data:image/jpg;base64,{base64_image}", "detail": detail}}
              for base64_image in images]
        ]

# verifying the image is not too big
image = images[0]
with BytesIO() as img_byte_arr:
    image.save(img_byte_arr, format='JPEG') 
    size_in_bytes = img_byte_arr.tell()
    print(f"{(size_in_bytes % (1024 * 1024)) / 1024:.2f} KB")

>>> 54.43 KB

# Failes with a 431 response
ret = await proxy_client.chat.completions.create([msg], model=llm_settings.GPT_4O_AZURE_MODEL_NAME)

# Works fine
ret = await direct_client.chat.completions.create([msg], model=llm_settings.GPT_4O_AZURE_MODEL_NAME)

Any log messages given by the failure

The error is: <Response [431 Request Header Fields Too Large]>

The headers don't seem too unusual:


pprint(dict(e.request.headers))
{'accept': 'application/json',
 'accept-encoding': 'gzip, deflate',
 'api-key': '<omitted>',
 'authorization':  '<omitted>',
 'connection': 'keep-alive',
 'content-length': '81209',
 'content-type': 'application/json',
 'host': '<omitted>,
 'user-agent': 'AsyncAzureOpenAI/Python 1.52.0',
 'x-stainless-arch': 'x64',
 'x-stainless-async': 'async:asyncio',
 'x-stainless-helper-method': 'beta.chat.completions.parse',
 'x-stainless-lang': 'python',
 'x-stainless-os': 'MacOS',
 'x-stainless-package-version': '1.52.0',
 'x-stainless-retry-count': '0',
 'x-stainless-runtime': 'CPython',
 'x-stainless-runtime-version': '3.11.7'}```

The Yarp logs don't show anything special:
info: Yarp.ReverseProxy.Forwarder.HttpForwarder[9]
      Proxying to https://glixsweden.openai.azure.com/openai/deployments/glix/chat/completions?api-version=2024-08-01-preview HTTP/2 RequestVersionOrLower 
info: Yarp.ReverseProxy.Forwarder.HttpForwarder[56]
      Received HTTP/2.0 response 431.

### Expected/desired behavior
The proxy seems to be the issue as the direct API works, maybe some configuration is required to enable sending images?
>

### OS and Version?
> macOS Sonoma 14.6.1 

### Versions
>  v1 docker

> ---------------------------------------------------------------
> Thanks! We'll be in touch soon.