[Bug]: json_mode doesn't work with deepinfra. The parameter is ignored and not even passed through the curl request.

What happened?

I can't make json_mode work with deepinfra through litellm, but it works just fine if I use deepinfra directly via the requests library.

Below a small snippet to reproduce the issue:

import openai
import os
import json

text_prompt = 'Say hi and answer this message with: {"a":"b"}'
model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"

#############################################
# Do the requests using deepinfra openai API
#############################################
client = openai.OpenAI(
    base_url="https://api.deepinfra.com/v1/openai",
    api_key=os.getenv("DEEPINFRA_API_KEY"),
)

messages = [{"role": "user", "content": text_prompt}]

response = client.chat.completions.create(
    model=model_name,
    messages=messages,
    response_format={"type": "json_object"},
    tool_choice="auto",
)

print("Response using deepinfra openai API:")
print(response.choices[0].message.content)
try:
    json.loads(response.choices[0].message.content)
    print("Response is a valid json object")
except json.JSONDecodeError as e:
    print("Response is NOT a valid json object")

##########################################################
# Test deepinfra through LiteLLM directly.
##########################################################
from litellm import completion
import litellm
litellm.set_verbose = True

model_name = "deepinfra/" + model_name

messages = [{"content": text_prompt, "role": "user"}]

# deepinfra call
response = completion(model=model_name, messages=messages)
print("Response using deepinfra through LiteLLM:")
print(response)

print("Response using deepinfra through LiteLLM with JSON mode enabled:")
response2 = completion(
    model=model_name, messages=messages, additional_kwargs={"response_format": {"type": "json_object"}}
)
print(response2)

On the provided logs you can see that the first example (using deepinfra directly with the requests library) works just fine.

Then, I test it with litellm. First, without using json_mode (not expected to work). Then, using JSON mode by providing the parameter 'additional_kwargs'. It still has no effect, but the parameter is at least passed to the curl request, although probably not fine. Finally, I try using json_mode by providing response_format keyword argument directly, as it was suggested by krrish on discord on a conversation we had. The parameter is just ignored, and it is not even passed to the curl request.

I've been testing this through the last versions of litellm and it has been consistently failing. I just tested it on the latest (1.38.10), where I confirm this still happens.

Relevant log output

Response using deepinfra openai API:                                                                                                                                                                                               
{"a":"b"}                                                                                                                                                                                                                          
Response is a valid json object                                                                                                                                                                                                    

Request to litellm:                                                                                                                                                                                                                
litellm.completion(model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', messages=[{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}])                                                               

self.optional_params: {}                                                                                                                                                                                                           
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False                                                                                                                                           
Final returned optional params: {'extra_body': {}}                                                                                                                                                                                 
self.optional_params: {'extra_body': {}}                                                                                                                                                                                           

POST Request Sent from LiteLLM:                                                                                                                                                                                                    
curl -X POST \                                                                                                                                                                                                                     
https://api.deepinfra.com/v1/openai/ \                                                                                                                                                                                             
-d '{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'messages': [{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], 'extra_body': {}}'                                                              

RAW RESPONSE:                                                                                                                                                                                                                      
{"id": "chatcmpl-123412341234", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": " Hi there! Here's your requested response:\n\n\"{\\\"a\\\":\\\"b\\\"}\"", "role": "
assistant", "function_call": null, "tool_calls": null, "name": null}}], "created": 1716813730, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "object": "chat.completion", "system_fingerprint": null, "usage": {"completion_toke
ns": 21, "prompt_tokens": 21, "total_tokens": 42, "estimated_cost": 1.008e-05}}                                                                                                                                                    

Logging Details LiteLLM-Success Call: None                                                                                                                                                                                         
##. 1: Response using deepinfra through LiteLLM:                                                                                                                                                                                   
Looking up model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map                                                                                                                                                  
ModelResponse(id='chatcmpl-123412341234', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' Hi there! Here\'s your requested response:\n\n"{\\"a\\":\\"b\\"}"', role='assistant'))], cr
eated=1716813730, model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=21, prompt_tokens=21, total_tokens=42))                                 
######################                                                                                                                                                                                                             
Success: model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map                                                                                                                                                    
##. 2: Response using deepinfra through LiteLLM:                                                                                                                                                                                   
prompt_tokens=21; completion_tokens=21                                                                                                                                                                                             
Returned custom cost for model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 - prompt_tokens_cost_usd_dollar: 5.67e-06, completion_tokens_cost_usd_dollar: 5.67e-06                                                               

Request to litellm:                                                                                                                                                                                                                
final cost: 1.134e-05; prompt_tokens_cost_usd_dollar: 5.67e-06; completion_tokens_cost_usd_dollar: 5.67e-06                                                                                                                        
litellm.completion(model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', messages=[{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], additional_kwargs={'response_format': {'type': 'json_object'}}
)                                                                                                                                                                                                                                  

success callbacks: []                                                                                                                                                                                                              

self.optional_params: {}                                                                                                                                                                                                           
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False                                                                                                                                           
Final returned optional params: {'extra_body': {'additional_kwargs': {'response_format': {'type': 'json_object'}}}}                                                                                                                
self.optional_params: {'extra_body': {'additional_kwargs': {'response_format': {'type': 'json_object'}}}} 

self.optional_params: {}
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'extra_body': {}}
self.optional_params: {'extra_body': {}}

POST Request Sent from LiteLLM:
POST Request Sent from LiteLLM:                                                                                                                                                                                                    
curl -X POST \                                                                                                                                                                                                                     
https://api.deepinfra.com/v1/openai/ \                                                                                                                                                                                             
-d '{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'messages': [{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], 'extra_body': {'additional_kwargs': {'response_format': {'type': 'json_object'}}
}}'                                                                                                                                                                                                                                

RAW RESPONSE:                                                                                                                                                                                                                      
{"id": "chatcmpl-123412341234", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": " Hi there! Here is your requested response:\n\n{\n\"a\": \"b\"\n}", "role": "assist
ant", "function_call": null, "tool_calls": null, "name": null}}], "created": 1716813685, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "object": "chat.completion", "system_fingerprint": null, "usage": {"completion_tokens": 2
1, "prompt_tokens": 21, "total_tokens": 42, "estimated_cost": 1.008e-05}}                                                                                                                                                          

Logging Details LiteLLM-Success Call: None                                                                                                                                                                                         
ModelResponse(id='chatcmpl-123412341234', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' Hi there! Here is your requested response:\n\n{\n"a": "b"\n}', role='assistant'))], created
=1716813685, model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=21, prompt_tokens=21, total_tokens=42))                                      
######################                                                                                                                                                                                                             
Looking up model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map                                                                                                                                                  
Success: model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map                                                                                                                                                    
##. 3: Response using deepinfra through LiteLLM:                                                                                                                                                                                   
prompt_tokens=21; completion_tokens=21                                                                                                                                                                                             

Returned custom cost for model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 - prompt_tokens_cost_usd_dollar: 5.67e-06, completion_tokens_cost_usd_dollar: 5.67e-06                                                               
Request to litellm:                                                                                                                                                                                                                
final cost: 1.134e-05; prompt_tokens_cost_usd_dollar: 5.67e-06; completion_tokens_cost_usd_dollar: 5.67e-06                                                                                                                        
litellm.completion(model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', messages=[{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], response_format={'type': 'json_object'})
success callbacks: []

curl -X POST \
https://api.deepinfra.com/v1/openai/ \
-d '{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'messages': [{'content': 'Say hi and answer this message with: {"a":"b"}', 'role': 'user'}], 'extra_body': {}}'

RAW RESPONSE:
{"id": "chatcmpl-123412341234", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": " Hi there! Here's your response:\n\n{\"a\":\"b\"}", "role": "assistant", "function_call": null, "tool_calls": null, "name": null}}], "created": 1716813732, "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", "object": "chat.completion", "system_fingerprint": null, "usage": {"completion_tokens": 16, "prompt_tokens": 21, "total_tokens": 37, "estimated_cost": 8.88e-06}}

Logging Details LiteLLM-Success Call: None
ModelResponse(id='chatcmpl-123412341234', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' Hi there! Here\'s your response:\n\n{"a":"b"}', role='assistant'))], created=1716813732, model='deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1', object='chat.completion', system_fingerprint=None, usage=Usage(completion_tokens=16, prompt_tokens=21, total_tokens=37))
######################
Looking up model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
Success: model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 in model_cost_map
prompt_tokens=21; completion_tokens=16
Returned custom cost for model=deepinfra/mistralai/Mixtral-8x7B-Instruct-v0.1 - prompt_tokens_cost_usd_dollar: 5.67e-06, completion_tokens_cost_usd_dollar: 4.32e-06
final cost: 9.990000000000001e-06; prompt_tokens_cost_usd_dollar: 5.67e-06; completion_tokens_cost_usd_dollar: 4.32e-06
success callbacks: []

Twitter / LinkedIn details

No response

BerriAI / litellm