I can't make json_mode work with deepinfra through litellm, but it works just fine if I use deepinfra directly via the requests library.
Below a small snippet to reproduce the issue:
import openai
import os
import json
text_prompt = 'Say hi and answer this message with: {"a":"b"}'
model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"
#############################################
# Do the requests using deepinfra openai API
#############################################
client = openai.OpenAI(
base_url="https://api.deepinfra.com/v1/openai",
api_key=os.getenv("DEEPINFRA_API_KEY"),
)
messages = [{"role": "user", "content": text_prompt}]
response = client.chat.completions.create(
model=model_name,
messages=messages,
response_format={"type": "json_object"},
tool_choice="auto",
)
print("Response using deepinfra openai API:")
print(response.choices[0].message.content)
try:
json.loads(response.choices[0].message.content)
print("Response is a valid json object")
except json.JSONDecodeError as e:
print("Response is NOT a valid json object")
##########################################################
# Test deepinfra through LiteLLM directly.
##########################################################
from litellm import completion
import litellm
litellm.set_verbose = True
model_name = "deepinfra/" + model_name
messages = [{"content": text_prompt, "role": "user"}]
# deepinfra call
response = completion(model=model_name, messages=messages)
print("Response using deepinfra through LiteLLM:")
print(response)
print("Response using deepinfra through LiteLLM with JSON mode enabled:")
response2 = completion(
model=model_name, messages=messages, additional_kwargs={"response_format": {"type": "json_object"}}
)
print(response2)
On the provided logs you can see that the first example (using deepinfra directly with the requests library) works just fine.
Then, I test it with litellm. First, without using json_mode (not expected to work).
Then, using JSON mode by providing the parameter 'additional_kwargs'. It still has no effect, but the parameter is at least passed to the curl request, although probably not fine.
Finally, I try using json_mode by providing response_format keyword argument directly, as it was suggested by krrish on discord on a conversation we had. The parameter is just ignored, and it is not even passed to the curl request.
I've been testing this through the last versions of litellm and it has been consistently failing. I just tested it on the latest (1.38.10), where I confirm this still happens.
Relevant log output
Response using deepinfra openai API:
{"a":"b"}
Response is a valid json object
Request to litellm:
litellm.completion(model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', messages=[{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}])
self.optional_params: {}
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'extra_body': {}}
self.optional_params: {'extra_body': {}}
POST Request Sent from LiteLLM:
curl -X POST \
https://api.deepinfra.com/v1/openai/ \
-d '{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'messages': [{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], 'extra_body': {}}'
RAW RESPONSE:
{"id": "chatcmpl-123412341234", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": " Hi there! Here's your requested response:\n\n\"{\\\"a\\\":\\\"b\\\"}\"", "role": "
assistant", "function_call": null, "tool_calls": null, "name": null}}], "created": 1716813730, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "object": "chat.completion", "system_fingerprint": null, "usage": {"completion_toke
ns": 21, "prompt_tokens": 21, "total_tokens": 42, "estimated_cost": 1.008e-05}}
Logging Details LiteLLM-Success Call: None
##. 1: Response using deepinfra through LiteLLM:
Looking up model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
ModelResponse(id='chatcmpl-123412341234', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' Hi there! Here\'s your requested response:\n\n"{\\"a\\":\\"b\\"}"', role='assistant'))], cr
eated=1716813730, model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=21, prompt_tokens=21, total_tokens=42))
######################
Success: model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
##. 2: Response using deepinfra through LiteLLM:
prompt_tokens=21; completion_tokens=21
Returned custom cost for model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 - prompt_tokens_cost_usd_dollar: 5.67e-06, completion_tokens_cost_usd_dollar: 5.67e-06
Request to litellm:
final cost: 1.134e-05; prompt_tokens_cost_usd_dollar: 5.67e-06; completion_tokens_cost_usd_dollar: 5.67e-06
litellm.completion(model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', messages=[{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], additional_kwargs={'response_format': {'type': 'json_object'}}
)
success callbacks: []
self.optional_params: {}
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'extra_body': {'additional_kwargs': {'response_format': {'type': 'json_object'}}}}
self.optional_params: {'extra_body': {'additional_kwargs': {'response_format': {'type': 'json_object'}}}}
self.optional_params: {}
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'extra_body': {}}
self.optional_params: {'extra_body': {}}
POST Request Sent from LiteLLM:
POST Request Sent from LiteLLM:
curl -X POST \
https://api.deepinfra.com/v1/openai/ \
-d '{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'messages': [{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], 'extra_body': {'additional_kwargs': {'response_format': {'type': 'json_object'}}
}}'
RAW RESPONSE:
{"id": "chatcmpl-123412341234", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": " Hi there! Here is your requested response:\n\n{\n\"a\": \"b\"\n}", "role": "assist
ant", "function_call": null, "tool_calls": null, "name": null}}], "created": 1716813685, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "object": "chat.completion", "system_fingerprint": null, "usage": {"completion_tokens": 2
1, "prompt_tokens": 21, "total_tokens": 42, "estimated_cost": 1.008e-05}}
Logging Details LiteLLM-Success Call: None
ModelResponse(id='chatcmpl-123412341234', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' Hi there! Here is your requested response:\n\n{\n"a": "b"\n}', role='assistant'))], created
=1716813685, model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=21, prompt_tokens=21, total_tokens=42))
######################
Looking up model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
Success: model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
##. 3: Response using deepinfra through LiteLLM:
prompt_tokens=21; completion_tokens=21
Returned custom cost for model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 - prompt_tokens_cost_usd_dollar: 5.67e-06, completion_tokens_cost_usd_dollar: 5.67e-06
Request to litellm:
final cost: 1.134e-05; prompt_tokens_cost_usd_dollar: 5.67e-06; completion_tokens_cost_usd_dollar: 5.67e-06
litellm.completion(model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', messages=[{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], response_format={'type': 'json_object'})
success callbacks: []
curl -X POST \
https://api.deepinfra.com/v1/openai/ \
-d '{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'messages': [{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], 'extra_body': {}}'
RAW RESPONSE:
{"id": "chatcmpl-123412341234", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": " Hi there! Here's your response:\n\n{\"a\":\"b\"}", "role": "assistant", "function_call": null, "tool_calls": null, "name": null}}], "created": 1716813732, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "object": "chat.completion", "system_fingerprint": null, "usage": {"completion_tokens": 16, "prompt_tokens": 21, "total_tokens": 37, "estimated_cost": 8.88e-06}}
Logging Details LiteLLM-Success Call: None
ModelResponse(id='chatcmpl-123412341234', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' Hi there! Here\'s your response:\n\n{"a":"b"}', role='assistant'))], created=1716813732, model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=16, prompt_tokens=21, total_tokens=37))
######################
Looking up model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
Success: model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
prompt_tokens=21; completion_tokens=16
Returned custom cost for model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 - prompt_tokens_cost_usd_dollar: 5.67e-06, completion_tokens_cost_usd_dollar: 4.32e-06
final cost: 9.990000000000001e-06; prompt_tokens_cost_usd_dollar: 5.67e-06; completion_tokens_cost_usd_dollar: 4.32e-06
success callbacks: []
What happened?
I can't make json_mode work with deepinfra through litellm, but it works just fine if I use deepinfra directly via the requests library.
Below a small snippet to reproduce the issue:
On the provided logs you can see that the first example (using deepinfra directly with the requests library) works just fine.
Then, I test it with litellm. First, without using json_mode (not expected to work). Then, using JSON mode by providing the parameter 'additional_kwargs'. It still has no effect, but the parameter is at least passed to the curl request, although probably not fine. Finally, I try using json_mode by providing response_format keyword argument directly, as it was suggested by krrish on discord on a conversation we had. The parameter is just ignored, and it is not even passed to the curl request.
I've been testing this through the last versions of litellm and it has been consistently failing. I just tested it on the latest (1.38.10), where I confirm this still happens.
Relevant log output
Twitter / LinkedIn details
No response