[Bug]: return_citations doesn't work while streaming (Proxy)

P3ntest commented 2 months ago

What happened?

When using the perplexity API with return_citations: true, LiteLLM Proxy will only pass them through when streaming is disabled. During streaming, the citations are no where to be found.

Relevant log output

(Working) Output from query without streaming:
{
  "id": "d91c2bc7-ca69-4b25-a4c8-c58fd9ccc3af",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "LiteLLM, which stands for \"Lightweight Large Language Model Library,\" is a Python package designed to simplify and unify interactions with various large language models (LLMs) provided by different companies. Here are the key features and benefits of LiteLLM:\n\n1. **Unified Interface**:\n   - LiteLLM provides a common interface for calling multiple LLM APIs, including those from OpenAI, Hugging Face, Anthropic, Cohere, and others. This standardization allows developers to use different LLMs with the same input/output format, reducing the complexity of handling multiple APIs[1][3][5].\n\n2. **Simplified Integration**:\n   - By translating inputs to match each provider's specific endpoint requirements, LiteLLM simplifies the integration process. This is particularly useful in a landscape where there is a lack of standardized API specifications for LLM providers[5].\n\n3. **Support for Multiple Providers**:\n   - LiteLLM supports over 100 LLMs from various providers, enabling developers to easily switch between different models without needing to learn individual APIs[3][4].\n\n4. **Proxy Server**:\n   - LiteLLM includes a proxy server feature that helps in configuring and running LLM servers. This involves setting up API keys and environment variables specific to each provider and running the LiteLLM proxy server to handle API calls[3].\n\n5. **Ease of Use**:\n   - The library streamlines interactions with LLMs, making it easier for developers to focus on their tasks rather than dealing with the complexities of different APIs. This is achieved through a consistent invocation method, which simplifies the process of using various LLMs in applications[1][4][5].\n\n6. **Flexibility and Efficiency**:\n   - LiteLLM enhances efficiency by allowing users to seamlessly harness the capabilities of different LLMs. It also supports features like completion, embedding, and image generation, making it a versatile tool for AI projects[4][5].\n\nOverall, LiteLLM acts as a bridge between various LLM providers, offering a unified and simplified way to integrate these models into applications, thereby enhancing the development process and reducing the complexity associated with multiple APIs.",
        "role": "assistant",
        "tool_calls": null,
        "function_call": null
      }
    }
  ],
  "created": 1724230863,
  "model": "perplexity/llama-3.1-sonar-large-128k-online",
  "object": "chat.completion",
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 445,
    "prompt_tokens": 5,
    "total_tokens": 450
  },
  "service_tier": null,
  "citations": [
    "https://hackernoon.com/litellm-call-every-llm-api-like-its-openai",
    "https://www.youtube.com/watch?v=ga76JOekmSQ",
    "https://microsoft.github.io/TaskWeaver/docs/llms/liteLLM/",
    "https://www.seaflux.tech/blogs/explore-litellm-effortless-ai-projects/",
    "https://www.thoughtworks.com/en-us/radar/languages-and-frameworks/litellm"
  ]
}

(Broken) Output from the same query with streaming:

data: {"id":"d4a47a45-6a18-4241-adad-7eaafcf20c2c","choices":[{"message":{"role":"assistant","content":"Lite"},"index":0,"delta":{"content":"Lite","role":"assistant"}}],"created":1724230777,"model":"perplexity/llama-3.1-sonar-large-128k-online","object":"chat.completion.chunk"}

[... truncated]

data: {"id":"d4a47a45-6a18-4241-adad-7eaafcf20c2c","choices":[{"message":{"role":"assistant","content":"LiteLLM, which stands for \"Lightweight Large Language Model Library,\" is a Python package designed to simplify the integration and use of various large language models (LLMs) from different providers. Here are the key features and benefits of LiteLLM:\n\n1. **Unified Interface**:\n   - LiteLLM provides a unified interface for interacting with multiple LLM providers, such as OpenAI, Azure, Cohere, Hugging Face, and Anthropic. This allows developers to use different models through a single, standardized API format, similar to OpenAI's API[1][2][4].\n\n2. **Simplified Integration**:\n   - It standardizes interactions with various LLM providers, translating inputs to match each provider's specific endpoint requirements. This simplifies the process of integrating multiple LLMs into projects, reducing the complexity associated with different APIs[1][2][5].\n\n3. **Support for Multiple Models**:\n   - LiteLLM supports over 100 different LLMs, enabling developers to easily switch between models from various providers without needing to learn each provider's API[4].\n\n4. **Streamlined Development**:\n   - By providing a common interface, LiteLLM streamlines the development process, allowing developers to focus on their tasks rather than dealing with the intricacies of different APIs. This enhances efficiency and flexibility in leveraging advanced AI capabilities[2][5].\n\n5. **Technical Features**:\n   - LiteLLM includes features such as a proxy server, which helps in configuring and running LLM servers. It also supports environment variables for setting up API keys, which vary by provider[4].\n   - The library supports various functionalities like completion, embedding, and image generation, making it versatile for different AI tasks[1].\n\n6. **Practical Use Cases**:\n   - LiteLLM is particularly useful for projects that require the use of multiple LLMs, as it eliminates the need for provider-specific implementations and reduces the complexity of handling different APIs[5].\n\nOverall, LiteLLM acts as a gateway to various state-of-the-art AI models, making it easier for developers to navigate and utilize the capabilities of different LLMs without the hassle of learning individual APIs."},"index":0,"delta":{"content":" APIs."}}],"created":1724230787,"model":"perplexity/llama-3.1-sonar-large-128k-online","object":"chat.completion.chunk"}

data: {"id":"d4a47a45-6a18-4241-adad-7eaafcf20c2c","choices":[{"index":0,"delta":{"content":"","role":"assistant"}}],"created":1724230787,"model":"perplexity/llama-3.1-sonar-large-128k-online","object":"chat.completion.chunk"}

data: {"id":"d4a47a45-6a18-4241-adad-7eaafcf20c2c","choices":[{"finish_reason":"stop","index":0,"delta":{}}],"created":1724230787,"model":"perplexity/llama-3.1-sonar-large-128k-online","object":"chat.completion.chunk"}

data: [DONE]

Twitter / LinkedIn details

@jul1us_05 https://www.linkedin.com/in/juliusvanvoorden

krrishdholakia commented 2 months ago

hey @P3ntest can you share the debug logs (run with --detailed_debug)

i want to see what the raw response from perplexity looks like

krrishdholakia commented 2 months ago

bump on this @P3ntest

krrishdholakia commented 2 months ago

following up on this @P3ntest

P3ntest commented 2 months ago

litellm_logs.md

@krrishdholakia Here are the logs, sorry for the late response. Please don't hesitate to ask anymore questions.

This problem still has high relevance, I was just on vacation the last two weeks.

Thank you!

krrishdholakia commented 2 months ago

Hey @P3ntest we had somebody else file a similar issue. https://github.com/BerriAI/litellm/issues/5535

the fix is live on litellm latest

BerriAI / litellm