BerriAI / litellm

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate, Groq (100+ LLMs)
https://docs.litellm.ai/docs/
Other
10.76k stars 1.22k forks source link

[Feature]: Log gemini system message to langfuse #2963

Closed CXwudi closed 1 month ago

CXwudi commented 3 months ago

What happened?

LiteLLM proxy server seems to not pass the system message when making /chat/completion for gemini-1.5-pro-latest (or any gemini model?)

I found this bug when checking through the langfuse logging.

Postman: image

Langfuse: image

Here is the test JSON I used:

test JSON ``` json { "model": "gemini-1.5-pro-latest", "messages": [ { "role": "system", "content": "You are provided a scanned copy of my resume (or portion of it), can you check if there is any grammar or vocabulary issue? \n\nMake sure to keep the same length for each sentences.\n\nMake sure to explain what you have modified in bullet points." }, { "role": "user", "content": "Easy Cucumber\t\t\t\t\t\t\t\t May 2022 – Apr 2023\n\tDeveloped a BDD testing library in Java, support executing Gherkin Language in any testing environments.\no\tRedesigned a cleaner architecture to provide better performance than the official Cucumber library.\n\tConducted research and implemented support for concurrent execution of multiple scenarios, to support testing complex scenarios that Cucumber could not handle." }, { "role": "assistant", "content": "Here's the revised version with corrections and explanations:\n\nEasy Cucumber May 2022 – Apr 2023\n• Developed a Behavior-Driven Development (BDD) testing library in Java, supporting the execution of Gherkin language in various testing environments.\n• Redesigned a cleaner architecture to provide better performance compared to the official Cucumber library.\n• Conducted research and implemented support for concurrent execution of multiple scenarios, enabling testing of complex scenarios that Cucumber could not handle.\n\nModifications:\n- Replaced \"a BDD testing library\" with \"a Behavior-Driven Development (BDD) testing library\" for clarity.\n- Changed \"support executing Gherkin Language\" to \"supporting the execution of Gherkin language\" for better grammar.\n- Replaced \"any testing environments\" with \"various testing environments\" for better phrasing.\n- Changed \"than the official Cucumber library\" to \"compared to the official Cucumber library\" for better grammar.\n- Replaced \"to support testing complex scenarios that Cucumber could not handle\" with \"enabling testing of complex scenarios that Cucumber could not handle\" for better phrasing and grammar." }, { "role": "user", "content": "I end up with following, is it fine?\n\nEasy Cucumber\t\t\t\t\t\t\t\t May 2022 – Apr 2023\n\tDeveloped a BDD test library in Java, support executing Gherkin Language in various test frameworks.\no\tRedesigned a cleaner architecture to provide better performance compared to the Cucumber library.\n\tConducted research and implemented support for concurrent execution of multiple scenarios, enabling testing of complex scenarios that Cucumber could not handle." } ] } ```

The corresponding LiteLLM proxy yaml file:

model_list:
- model_name: gemini-1.5-pro-latest
  litellm_params:
    model: gemini/gemini-1.5-pro-latest
    api_key: os.environ/GEMINI_API_KEY

Relevant log output

No response

Twitter / LinkedIn details

@CXwudi / https://www.linkedin.com/in/charles-chen-cc98/

krrishdholakia commented 3 months ago

Hi @CXwudi i believe the call is working as expected.

gemini accepts the system prompt as a separate arg - https://github.com/BerriAI/litellm/blob/6e934cb842f762830949312bc37760eb2d950b9e/litellm/llms/gemini.py#L146

You should be able to confirm this when running the proxy with --detailed_debug and seeing the request we make.

since it's passed separately, i believe it's missed from the langfuse logging object. Will add it.

CXwudi commented 3 months ago

Hi @krrishdholakia,

I just tested through both postman and Google AI Studio and I am pretty sure the system message is missing.

This time I am using a simple test JSON:

{
    "model": "gemini-1.5-pro-latest",
    "messages": [
        {
            "role": "system",
            "content": "If you are asked about who is your best girl, answer \"Hatsune Miku\" please."
        },
        {
            "role": "user",
            "content": "What is your best girl?"
        }
    ]
}

Test it on Google AI Studio gives the correct result:

image

However, the same JSON sent from postman returns:

{
    "id": "chatcmpl-2bb1ff75-b5e2-4e7a-be97-9a19ed814f90",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 1,
            "message": {
                "content": "As an AI language model, I am not capable of having personal opinions or beliefs. Therefore, I do not have a \"best girl\" or any preferences of that nature. \n\nIs there anything else I can assist you with? \n",
                "role": "assistant"
            }
        }
    ],
    "created": 1712887478,
    "model": "gemini/gemini-1.5-pro-latest",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 6,
        "completion_tokens": 47,
        "total_tokens": 53
    }
}
--detailed_debug from postman ```log 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: 597b81f17c45e0b3e24b9f7a8edf4a55795147cc69aead77174c71f267857bb9; local_only: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: token='597b81f17c45e0b3e24b9f7a8edf4a55795147cc69aead77174c71f267857bb9' key_name=None key_alias=None spend=0.0009649000000000001 max_budget=None expires=None models=[] aliases={} config={} user_id='default_user_id' team_id=None max_parallel_requests=None metadata={} tpm_limit=None rpm_limit=None budget_duration=None budget_reset_at=None allowed_cache_controls=[] permissions={} model_spend={} model_max_budget={} soft_budget_cooldown=False litellm_budget_table=None user_id_rate_limits=None team_id_rate_limits=None team_spend=None team_tpm_limit=None team_rpm_limit=None team_max_budget=None team_models=[] team_blocked=False soft_budget=None team_model_aliases=None api_key='sk-ultimate-mikuchat' user_role='proxy_admin' 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: proxy_server.py:3394 - Request Headers: Headers({'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}) 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: proxy_server.py:3400 - receiving data: {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}], 'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}]}}, 'ttl': None} 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: utils.py:36 - Inside Proxy Logging Pre-call hook! 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - Inside Max Parallel Request Pre-Call Hook 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: sk-ultimate-mikuchat::2024-04-12-02-04::request_count; local_only: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - current: None 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: tpm_rpm_limiter.py:33 - Inside Max TPM/RPM Limiter Pre-Call Hook - token='597b81f17c45e0b3e24b9f7a8edf4a55795147cc69aead77174c71f267857bb9' key_name=None key_alias=None spend=0.0009649000000000001 max_budget=None expires=None models=[] aliases={} config={} user_id='default_user_id' team_id=None max_parallel_requests=None metadata={} tpm_limit=None rpm_limit=None budget_duration=None budget_reset_at=None allowed_cache_controls=[] permissions={} model_spend={} model_max_budget={} soft_budget_cooldown=False litellm_budget_table=None user_id_rate_limits=None team_id_rate_limits=None team_spend=None team_tpm_limit=None team_rpm_limit=None team_max_budget=None team_models=[] team_blocked=False soft_budget=None team_model_aliases=None api_key='sk-ultimate-mikuchat' user_role='proxy_admin' 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: sk-ultimate-mikuchat; local_only: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: default_user_id; local_only: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: {'user_id': 'default_user_id', 'max_budget': None, 'spend': 0.0009649000000000001, 'model_max_budget': {}, 'model_spend': {}, 'user_email': None, 'models': [], 'tpm_limit': None, 'rpm_limit': None} 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: tpm_rpm_limiter.py:33 - _set_limits: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: default_user_id_user_api_key_user_id; local_only: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM Proxy:DEBUG: utils.py:36 - final data being sent to completion call: {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}], 'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}]}}, 'ttl': None, 'user': 'default_user_id', 'metadata': {'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions'}, 'request_timeout': 600} 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:1232 - Inside async function with retries: args - (); kwargs - {'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}]}}, 'ttl': None, 'user': 'default_user_id', 'metadata': {'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions', 'model_group': 'gemini-1.5-pro-latest'}, 'request_timeout': 600, 'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}], 'original_function': >, 'num_retries': 0} 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:1240 - async function w/ retries: original_function - > 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:414 - Inside _acompletion()- model: gemini-1.5-pro-latest; kwargs: {'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}]}}, 'ttl': None, 'user': 'default_user_id', 'metadata': {'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions', 'model_group': 'gemini-1.5-pro-latest'}, 'request_timeout': 600} 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:2475 - initial list of deployments: [{'model_name': 'gemini-1.5-pro-latest', 'litellm_params': {'model': 'gemini/gemini-1.5-pro-latest', 'api_key': '', 'max_retries': 2}, 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}}] 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:2479 - healthy deployments: length 1 [{'model_name': 'gemini-1.5-pro-latest', 'litellm_params': {'model': 'gemini/gemini-1.5-pro-latest', 'api_key': '', 'max_retries': 2}, 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}}] 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: 02-04:cooldown_models; local_only: False 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:1678 - retrieve cooldown models: [] 2024-04-11 22:04:35 02:04:35 - LiteLLM Router:DEBUG: router.py:2595 - cooldown deployments: [] 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: 0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90; local_only: True 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - set cache: key: 0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90; value: 1 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - InMemoryCache: set_cache 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: 0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90_async_client; local_only: True 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: 0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90_async_client; local_only: True 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - 2024-04-11 22:04:35 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - Request to litellm: 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - litellm.acompletion(model='gemini/gemini-1.5-pro-latest', api_key='', max_retries=0, messages=[{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}], caching=False, client=None, timeout=6000, proxy_server_request={'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'system', 'content': 'If you are asked about who is your best girl, answer "Hatsune Miku" please.'}, {'role': 'user', 'content': 'What is your best girl?'}]}}, ttl=None, user='default_user_id', metadata={'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions', 'model_group': 'gemini-1.5-pro-latest', 'deployment': 'gemini/gemini-1.5-pro-latest', 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'caching_groups': None}, request_timeout=600, model_info={'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}) 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - 2024-04-11 22:04:35 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - Initialized litellm callbacks, Async Success Callbacks: [, , , , , >] 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - self.optional_params: {} 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - litellm.cache: None 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - kwargs[caching]: False; litellm.cache: None 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:4543 - 2024-04-11 22:04:35 LiteLLM completion() model= gemini-1.5-pro-latest; provider = gemini 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:4546 - 2024-04-11 22:04:35 LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': None, 'top_p': None, 'n': None, 'stream': None, 'stop': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': 'default_user_id', 'model': 'gemini-1.5-pro-latest', 'custom_llm_provider': 'gemini', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None} 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:4549 - 2024-04-11 22:04:35 LiteLLM: Non-Default params passed to completion() {'user': 'default_user_id', 'max_retries': 0} 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - Final returned optional params: {} 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:936 - self.optional_params: {} 2024-04-11 22:04:35 02:04:35 - LiteLLM:DEBUG: utils.py:1091 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': {'inference_params': {}}} 2024-04-11 22:04:35 02:04:35 - LiteLLM:INFO: utils.py:1112 - {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'user', 'content': 'What is your best girl?'}], 'optional_params': {}, 'litellm_params': {'acompletion': True, 'api_key': '', 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'gemini', 'api_base': '', 'litellm_call_id': '361f8ad5-f82a-4a3a-a956-61f134ee904a', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': {'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions', 'model_group': 'gemini-1.5-pro-latest', 'deployment': 'gemini/gemini-1.5-pro-latest', 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'caching_groups': None}, 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'user', 'content': 'What is your best girl?'}]}}, 'preset_cache_key': None, 'no-log': False, 'stream_response': {}}, 'start_time': datetime.datetime(2024, 4, 12, 2, 4, 35, 912780), 'stream': False, 'user': 'default_user_id', 'call_type': 'acompletion', 'litellm_call_id': '361f8ad5-f82a-4a3a-a956-61f134ee904a', 'completion_start_time': None, 'input': ['What is your best girl?'], 'api_key': '', 'additional_args': {'complete_input_dict': {'inference_params': {}}}, 'log_event_type': 'pre_api_call'} 2024-04-11 22:04:35 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - RAW RESPONSE: 2024-04-11 22:04:38 2024-04-11 22:04:38 2024-04-11 22:04:38 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: main.py:3835 - raw model_response: 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Async Wrapper: Completed Call, calling async_success_handler: > 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Logging Details LiteLLM-Success Call: None 2024-04-11 22:04:38 02:04:38 - LiteLLM Router:INFO: router.py:479 - litellm.acompletion(model=gemini/gemini-1.5-pro-latest) 200 OK 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:1288 - Model=gemini-1.5-pro-latest; 2024-04-11 22:04:38 02:04:38 - LiteLLM Router:DEBUG: router.py:1151 - Async Response: ModelResponse(id='chatcmpl-2bb1ff75-b5e2-4e7a-be97-9a19ed814f90', choices=[Choices(finish_reason='stop', index=1, message=Message(content='As an AI language model, I am not capable of having personal opinions or beliefs. Therefore, I do not have a "best girl" or any preferences of that nature. \n\nIs there anything else I can assist you with? \n', role='assistant'))], created=1712887478, model='gemini/gemini-1.5-pro-latest', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=6, completion_tokens=47, total_tokens=53)) 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:4001 - completion_response response ms: 2919.216 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Logging Details LiteLLM-Async Success Call: None 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Looking up model=gemini/gemini-1.5-pro-latest in model_cost_map 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:1288 - Model=gemini-1.5-pro-latest; 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:1333 - Model=gemini-1.5-pro-latest not found in completion cost map. 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:4001 - completion_response response ms: 2919.216 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - success callbacks: ['langfuse', , , , ] 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Looking up model=gemini/gemini-1.5-pro-latest in model_cost_map 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:1576 - reaches langfuse for success logging! 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:1333 - Model=gemini-1.5-pro-latest not found in completion cost map. 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Instantiates langfuse client 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Async success callbacks: [, , , , , >] 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: sk-ultimate-mikuchat::2024-04-12-02-04::request_count; local_only: False 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - updated_value in success call: {'current_requests': 0, 'current_tpm': 106, 'current_rpm': 2}, precise_minute: 2024-04-12-02-04 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - set cache: key: sk-ultimate-mikuchat::2024-04-12-02-04::request_count; value: {'current_requests': 0, 'current_tpm': 106, 'current_rpm': 2} 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - InMemoryCache: set_cache 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: default_user_id::2024-04-12-02-04::request_count; local_only: False 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - updated_value in success call: {'current_requests': 0, 'current_tpm': 106, 'current_rpm': 2}, precise_minute: 2024-04-12-02-04 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - set cache: key: default_user_id::2024-04-12-02-04::request_count; value: {'current_requests': 0, 'current_tpm': 106, 'current_rpm': 2} 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - InMemoryCache: set_cache 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: tpm_rpm_limiter.py:33 - INSIDE TPM RPM Limiter ASYNC SUCCESS LOGGING 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: sk-ultimate-mikuchat; local_only: False 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: default_user_id; local_only: False 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: {'user_id': 'default_user_id', 'max_budget': None, 'spend': 0.0009649000000000001, 'model_max_budget': {}, 'model_spend': {}, 'user_email': None, 'models': [], 'tpm_limit': None, 'rpm_limit': None} 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: proxy_server.py:1209 - INSIDE _PROXY_track_cost_callback 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: proxy_server.py:1213 - Proxy: In track_cost_callback for: {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'user', 'content': 'What is your best girl?'}], 'optional_params': {}, 'litellm_params': {'acompletion': True, 'api_key': '', 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'gemini', 'api_base': '', 'litellm_call_id': '361f8ad5-f82a-4a3a-a956-61f134ee904a', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': {'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions', 'model_group': 'gemini-1.5-pro-latest', 'deployment': 'gemini/gemini-1.5-pro-latest', 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'caching_groups': None}, 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'user', 'content': 'What is your best girl?'}]}}, 'preset_cache_key': None, 'no-log': False, 'stream_response': {}}, 'start_time': datetime.datetime(2024, 4, 12, 2, 4, 35, 912780), 'stream': False, 'user': 'default_user_id', 'call_type': 'acompletion', 'litellm_call_id': '361f8ad5-f82a-4a3a-a956-61f134ee904a', 'completion_start_time': datetime.datetime(2024, 4, 12, 2, 4, 38, 831996), 'input': ['What is your best girl?'], 'api_key': '', 'additional_args': {'complete_input_dict': {}}, 'log_event_type': 'post_api_call', 'original_response': , 'end_time': datetime.datetime(2024, 4, 12, 2, 4, 38, 831996), 'cache_hit': None, 'response_cost': None} 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: proxy_server.py:1214 - kwargs stream: False + complete streaming response: None 2024-04-11 22:04:38 02:04:38 - LiteLLM Proxy:DEBUG: proxy_server.py:1283 - error in tracking cost callback - Model not in litellm model cost map. Add custom pricing - https://docs.litellm.ai/docs/proxy/custom_pricing 2024-04-11 22:04:38 response_obj: ModelResponse(id='chatcmpl-241f9dff-2e01-4a88-87ec-f1806eade18e', choices=[Choices(finish_reason='stop', index=1, message=Message(content='As an AI language model, I don\'t have personal preferences like having a "best girl." I can, however, provide you with information on various fictional female characters or help you explore different character archetypes if you\'d like! \n\nIs there anything specific you\'re interested in learning about? \n', role='assistant'))], created=1712887339, model='gemini/gemini-1.5-pro-latest', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=6, completion_tokens=59, total_tokens=65)) 2024-04-11 22:04:38 getting usage, cost=None 2024-04-11 22:04:38 constructed usage - {'prompt_tokens': 6, 'completion_tokens': 59, 'total_cost': None} 2024-04-11 22:04:38 INFO: 172.25.0.1:55464 - "POST /v1/chat/completions HTTP/1.1" 200 OK 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Langfuse Logging - Enters logging function for model {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'user', 'content': 'What is your best girl?'}], 'optional_params': {}, 'litellm_params': {'acompletion': True, 'api_key': '', 'force_timeout': 600, 'logger_fn': None, 'verbose': False, 'custom_llm_provider': 'gemini', 'api_base': '', 'litellm_call_id': '361f8ad5-f82a-4a3a-a956-61f134ee904a', 'model_alias_map': {}, 'completion_call_id': None, 'metadata': {'user_api_key': 'sk-ultimate-mikuchat', 'user_api_key_alias': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'endpoint': 'http://localhost:6001/v1/chat/completions', 'model_group': 'gemini-1.5-pro-latest', 'deployment': 'gemini/gemini-1.5-pro-latest', 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'caching_groups': None}, 'model_info': {'id': '0cdd3aba7daa6215828fe92b268271942f92828eacd1378d53793435f1eddc90', 'description': 'gemini-1.5-pro-latest from Google Gemini Official. Mid-size multimodal model that supports up to 1 million tokens', 'max_tokens': 1048576}, 'proxy_server_request': {'url': 'http://localhost:6001/v1/chat/completions', 'method': 'POST', 'headers': {'accept': 'application/json', 'content-type': 'application/json', 'authorization': 'Bearer sk-ultimate-mikuchat', 'user-agent': 'PostmanRuntime/7.37.0', 'cache-control': 'no-cache', 'postman-token': '7bde7985-b26d-44dd-9908-a363902f6c1a', 'host': 'localhost:6001', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '328'}, 'body': {'model': 'gemini-1.5-pro-latest', 'messages': [{'role': 'user', 'content': 'What is your best girl?'}]}}, 'preset_cache_key': None, 'no-log': False, 'stream_response': {}}, 'start_time': datetime.datetime(2024, 4, 12, 2, 4, 35, 912780), 'stream': False, 'user': 'default_user_id', 'call_type': 'acompletion', 'litellm_call_id': '361f8ad5-f82a-4a3a-a956-61f134ee904a', 'completion_start_time': datetime.datetime(2024, 4, 12, 2, 4, 38, 831996), 'input': ['What is your best girl?'], 'api_key': '', 'additional_args': {'complete_input_dict': {}}, 'log_event_type': 'successful_api_call', 'end_time': datetime.datetime(2024, 4, 12, 2, 4, 38, 831996), 'cache_hit': None, 'response_cost': None} 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - OUTPUT IN LANGFUSE: {'content': 'As an AI language model, I am not capable of having personal opinions or beliefs. Therefore, I do not have a "best girl" or any preferences of that nature. \n\nIs there anything else I can assist you with? \n', 'role': 'assistant'}; original: ModelResponse(id='chatcmpl-2bb1ff75-b5e2-4e7a-be97-9a19ed814f90', choices=[Choices(finish_reason='stop', index=1, message=Message(content='As an AI language model, I am not capable of having personal opinions or beliefs. Therefore, I do not have a "best girl" or any preferences of that nature. \n\nIs there anything else I can assist you with? \n', role='assistant'))], created=1712887478, model='gemini/gemini-1.5-pro-latest', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=6, completion_tokens=47, total_tokens=53)) 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Langfuse Layer Logging - logging to langfuse v2 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - trace: None 2024-04-11 22:04:38 02:04:38 - LiteLLM:DEBUG: utils.py:936 - Langfuse Layer Logging - final response object: ModelResponse(id='chatcmpl-2bb1ff75-b5e2-4e7a-be97-9a19ed814f90', choices=[Choices(finish_reason='stop', index=1, message=Message(content='As an AI language model, I am not capable of having personal opinions or beliefs. Therefore, I do not have a "best girl" or any preferences of that nature. \n\nIs there anything else I can assist you with? \n', role='assistant'))], created=1712887478, model='gemini/gemini-1.5-pro-latest', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=6, completion_tokens=47, total_tokens=53)) 2024-04-11 22:04:38 02:04:38 - LiteLLM:INFO: langfuse.py:161 - Langfuse Layer Logging - logging success 2024-04-11 22:04:39 02:04:39 - LiteLLM Proxy:DEBUG: utils.py:2104 - Team Spend transactions: 0 2024-04-11 22:04:39 02:04:39 - LiteLLM Proxy:DEBUG: utils.py:2154 - Spend Logs transactions: 0 ```

So could we revert the issue title back?

krrishdholakia commented 3 months ago

Hi @CXwudi just ran this locally. Here's the attached image of the system prompt being sent.

Screenshot 2024-04-11 at 11 14 57 PM

We weren't logging the system prompt. This is now fixed - https://github.com/BerriAI/litellm/commit/7a3821e0f6700b3ccb5baa5d688ab48dde60c349 (should be live soon in v1.35.2)

Here's the code: https://github.com/BerriAI/litellm/blob/c480b5a008b9cde5ca4c6fd8ce2c299d0f423478/litellm/llms/gemini.py#L186

CXwudi commented 3 months ago

Huh, that is wired, I gonna investigate more why I couldn't get my test JSON work..

CXwudi commented 3 months ago

Okie, I see the problem, currently we are using google-generativeai==0.3.2 in requirements.txt, setting the version to 0.5.0 will works.

I will wait until that dependency is updated

krrishdholakia commented 3 months ago

thanks for the catch.

fix pushed @CXwudi https://github.com/BerriAI/litellm/commit/b0770cf8e20e9a814924b619e9ff872d024898e4

Manouchehri commented 3 months ago

Is this the same issue as #3241?

CXwudi commented 3 months ago

@Manouchehri not really but can be similar, mine is about the system message, previously my issue is about the missing system message due to the outdated dependencies, but it is as well missing the system message in the log

krrishdholakia commented 3 months ago

Hey @CXwudi, unable to repro this - i setup vertex_ai/gemini-1.5-pro-preview-0409 on our staging env (v1.35.29) and just ran a test query. I can see the system prompt being logged to langfuse:

Screenshot 2024-04-26 at 4 25 03 PM
krrishdholakia commented 3 months ago

closing as this unable to repro. @CXwudi please bump me, if you're still seeing this, with a way to repro this.

Attaching a curl with a key for testing on our staging env(1RPM, will expire in 24hrs)

curl --location 'https://staging.litellm.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-Yx4jQsYBUlTkU51L5iltEA' \
--data '{
  "model": "gemini-1.5-pro-latest",
  "messages": [
      { "role": "system", "content": "Be a good bot" },
    { "role": "user", "content": "What'\''s the weather today?" }
    ]
}'
CXwudi commented 2 months ago

Hi @krrishdholakia, mine is reproducible with Google AI studio, like I mentioned in the issue description, the config I used is

model_list:
- model_name: gemini-1.5-pro-latest
  litellm_params:
    model: gemini/gemini-1.5-pro-latest
    api_key: os.environ/GEMINI_API_KEY

I just tested and it is still missing: image image

antmanler commented 1 month ago

closing as this unable to repro. @CXwudi please bump me, if you're still seeing this, with a way to repro this.

Attaching a curl with a key for testing on our staging env(1RPM, will expire in 24hrs)

curl --location 'https://staging.litellm.ai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-Yx4jQsYBUlTkU51L5iltEA' \
--data '{
  "model": "gemini-1.5-pro-latest",
  "messages": [
      { "role": "system", "content": "Be a good bot" },
    { "role": "user", "content": "What'\''s the weather today?" }
    ]
}'

same issue here, reproduced with [1.40.22], the system was not passe

 - litellm_params:
      api_key: *****
      model: gemini/gemini-1.5-pro-latest
    model_name: gemini-1.5-pro
image
krrishdholakia commented 1 month ago

@antmanler thanks

able to repro