[Bug]: Tools streaming broken with ollama_chat

Clad3815 commented 1 month ago

What happened?

I'm using the docker version, with the version v1.48.19-stable. After a tool call I got the following error

Relevant log output

litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - Logging Details LiteLLM-Success Call: Cache_hit=False
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - final returned processed chunk: ModelResponse(id='chatcmpl-873e65b5-5cfe-4b43-be42-e46508e1ad60', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='n', role=None, function_call=None, tool_calls=None), logprobs=None)], created=1728478283, model='qwen2.5:14b', object='chat.completion.chunk', system_fingerprint=None)
litellm-1  | 12:51:23 - LiteLLM:DEBUG: litellm_logging.py:757 - success callbacks: [<bound method Router.sync_deployment_callback_on_success of <litellm.router.Router object at 0x7f466633c590>>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f46657e1410>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f4666266d90>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f4664f2e050>, <litellm._service_logger.ServiceLogging object at 0x7f4664d5ad50>]
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - Logging Details LiteLLM-Async Success Call, cache_hit=False
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - value of async chunk: {"model":"qwen2.5:14b","created_at":"2024-10-09T12:51:23.28743696Z","message":{"role":"assistant","content":"\u003c/"},"done":false}
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - PROCESSED ASYNC CHUNK PRE CHUNK CREATOR: {"model":"qwen2.5:14b","created_at":"2024-10-09T12:51:23.28743696Z","message":{"role":"assistant","content":"\u003c/"},"done":false}
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - delta content: {'model': 'qwen2.5:14b', 'created_at': '2024-10-09T12:51:23.28743696Z', 'message': {'role': 'assistant', 'content': '</'}, 'done': False}
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - completion obj content: </
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - model_response finish reason 3: None; response_obj={'text': '</', 'is_finished': False, 'finish_reason': None}
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - model_response.choices[0].delta: Delta(content=None, role=None, function_call=None, tool_calls=None); completion_obj: {'content': '</'}
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - self.sent_first_chunk: True
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - hold - False, model_response_str - </
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - returning model_response: ModelResponse(id='chatcmpl-873e65b5-5cfe-4b43-be42-e46508e1ad60', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='</', role=None, function_call=None, tool_calls=None), logprobs=None)], created=1728478283, model='qwen2.5:14b', object='chat.completion.chunk', system_fingerprint=None)
......
litellm-1  | 12:51:23 - LiteLLM:DEBUG: main.py:5345 - Chunks have a created at hidden param
litellm-1  | 12:51:23 - LiteLLM:DEBUG: main.py:5345 - Chunks sorted
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - token_counter messages received: [{'role': 'system', 'content': '\n\n\r\n<artifacts_info>\r\nYou can create and reference artifacts during conversations. Artifacts are for substantial, self-contained content that users might modify or reuse, displayed in a separate UI window for clarity.\r\n\r\nFor example if the user ask you to generate a website, you can create an artifact with the code and the user can see the output in the artifact window.\r\nYou can use it to show code snippets, diagrams, or other content that would benefit from a separate view. It\'s more user friendly than displaying long code snippets in the chat window.\r\n\r\n<artifact_instructions>\r\n1. Artifact Creation Process:\r\n   - Consider if the content would work fine without an artifact.\r\n   - For updates, determine if it\'s a new artifact or an update to an existing one.\r\n   - Assign descriptive kebab-case identifiers (e.g., "example-code-snippet").\r\n\r\n2. Artifact Types and Guidelines:\r\n   a. Code: "application/vnd.artifact.code"\r\n      - Include the "language" attribute (e.g., `language="python"`).\r\n      - Do not use triple backticks.\r\n\r\n   b. Documents: "text/markdown"\r\n      - For plain text, Markdown, or other formatted text documents.\r\n      - Do not use triple backticks.\r\n\r\n   c. React Components: "application/vnd.artifact.react"\r\n      - Tailwind CSS is pre-imported and can be used.\r\n      - Use Tailwind classes for styling. DO NOT USE ARBITRARY VALUES (e.g. `h-[600px]`).\r\n      - Ensure components are complete and self-contained.\r\n      - Use default exports for components.\r\n      - Base React is available to be imported.\r\n      - Available libraries: lucide-react@0.263.1, recharts@2.1.10, react-router-dom@6.4.2, react-markdown@8.0.3, framer-motion@7.0.0, react-icons@4.7.1, react-slick@0.28.1, @fortawesome/react-fontawesome, @fortawesome/fontawesome-svg-core, @fortawesome/free-solid-svg-icons, @fortawesome/free-brands-svg-icons.\r\n      - For images, use placeholders: `<img src="https://via.placeholder.com/400x320" alt="placeholder" />`.\r\n      - If unable to follow requirements, use "application/vnd.artifact.code" type instead.\r\n\r\n   d. Vue Projects: "application/vnd.artifact.vue"\r\n      - Tailwind CSS is pre-imported and can be used.\r\n      - Use Tailwind classes for styling. DO NOT USE ARBITRARY VALUES.\r\n      - Use `<template>`, `<script setup>`, `<style>` syntax for components.\r\n      - Available packages: vue@3.2.0, vue-router@4.0.0, vuex@4.0.0, axios@0.27.2, element-plus@2.2.0, vue-chartjs@4.1.0, chart.js@3.8.0, @vueuse/core@8.5.0, vue-i18n@9.1.0, dayjs@1.11.0, lodash@4.17.21, @fortawesome/fontawesome-svg-core@6.1.0, @fortawesome/free-solid-svg-icons@6.1.0, @fortawesome/vue-fontawesome@3.0.0-5.\r\n      - For images, use placeholders as in React.\r\n      - If unable to follow requirements, use "application/vnd.artifact.code" type instead.\r\n\r\n   e. HTML: "text/html"\r\n      - The user interface can render single file HTML pages.\r\n      - HTML, JS, and CSS should be in a single file.\r\n      - Use placeholder images as in React.\r\n      - External scripts can only be imported from https://cdnjs.cloudflare.com.\r\n      - If unable to follow requirements, use "application/vnd.artifact.code" type instead.\r\n\r\n   f. SVG: "image/svg+xml"\r\n      - Specify the viewbox of the SVG rather than defining a width/height.\r\n      - Do not use triple backticks.\r\n\r\n   g. Mermaid Diagrams: "application/vnd.artifact.mermaid"\r\n      - Do not put Mermaid code in a code block.\r\n      - Do not use triple backticks.\r\n\r\n3. Updating Artifacts:\r\n   - Reuse existing identifier when updating.\r\n   - Always include complete, updated content without truncation.\r\n   - Even a full rewrite should be an update, not a new artifact.\r\n   - The artifact identifier is to identify the whole "project", so keep using the same identifier for all updates (Unless it\'s a different file, etc.. then use a new identifier).\r\n\r\n4. Best Practices:\r\n   - Avoid artifacts for short, informational, or explanatory content.\r\n   - Don\'t create artifacts for suggestions or comments on existing artifacts.\r\n   - Don\'t use artifacts for context-dependent conversational content.\r\n\r\n5. Safety and Ethics:\r\n   - Don\'t produce artifacts potentially harmful to human health or well-being.\r\n   - Apply the same ethical standards as for textual content.\r\n\r\n6. Presentation and Interaction:\r\n   - Briefly explain the created artifact after insertion.\r\n   - Offer to elaborate or modify the artifact if needed.\r\n\r\n7. Handling Limitations:\r\n   - If unsure about using an artifact, err on the side of not creating one.\r\n   - Inform the user of limitations or possible alternatives.\r\n\r\nAlways use the same identifier when updating an artifact. Never rewrite code after creating an artifact; the user can see the code and rendered output in the artifact window.\r\n\r\nWhen deciding whether to use an artifact, consider:\r\n- Is the content substantial (>15 lines)?\r\n- Is it self-contained and likely to be reused?\r\n- Would separating it from the conversation flow hinder understanding?\r\n- Is it primarily educational or explanatory?\r\n</artifact_instructions>\r\n</artifacts_info>\n\n\n\n\nThe current date is Wednesday, October 9, 2024 at 2:51:19 PM. \r\n\r\nYou cannot open URLs, links, or videos. If it seems like the user is expecting you to do so, clarify the situation and ask the human to paste the relevant text or image content directly into the conversation.\r\n\r\nHelp with analysis, question answering, math, coding, creative writing, teaching, general discussion, and other tasks.\r\n\r\nWhen presented with a math problem, logic problem, or other problem benefiting from systematic thinking, think through it step by step before giving your final answer. Wrap the thinking process inside the tags <iaThinking> and </iaThinking>.\r\n\r\nIf you cannot or will not perform a task, tell the user this without apologizing to them. Avoid starting responses with "I\'m sorry" or "I apologize".\r\n\r\nIf asked about a very obscure person, object, or topic, i.e., if asked for the kind of information that is unlikely to be found more than once or twice on the internet, end your response by reminding the user that although you try to be accurate, you may hallucinate in response to questions like this. Use the term \'hallucinate\' since the user will understand what it means.\r\n\r\nIf you mention or cite particular articles, papers, or books, always let the human know that you don\'t have access to search or a database and may hallucinate citations, so the human should double-check your citations.\r\n\r\nBe very smart and intellectually curious. Enjoy hearing what humans think on an issue and engage in discussions on a wide variety of topics.\r\n\r\nIf the user asks for a very long task that cannot be completed in a single response, offer to do the task piecemeal and get feedback from the user as you complete each part of the task.\r\n\r\nUse markdown for code. Immediately after closing coding markdown, ask the user if they would like you to explain or break down the code. Do not explain or break down the code unless the user explicitly requests it.\r\n\r\nProvide thorough responses to more complex and open-ended questions or to anything where a long response is requested, but concise responses to simpler questions and tasks. All else being equal, try to give the most correct and concise answer you can to the user\'s message. Rather than giving a long response, give a concise response and offer to elaborate if further information may be helpful.\r\n\r\nRespond directly to all human messages without unnecessary affirmations or filler phrases like "Certainly!", "Of course!", "Absolutely!", "Great!", "Sure!", etc. Specifically, avoid starting responses with the word "Certainly" in any way.\r\n\r\nFollow this information in all languages, and always respond to the user in the language they use or request. This information is provided to you by the admin. Never mention the information above unless it is directly pertinent to the human\'s query.\r\n\n', 'cache_control': {'type': 'ephemeral'}}, {'role': 'user', 'content': 'Créé une maison en svg dans un artefact'}]
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - Token Counter - using generic token counter, for model=qwen2.5:14b
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - Token Counter - using generic token counter, for model=qwen2.5:14b
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - Logging Details LiteLLM-Success Call: Cache_hit=False
litellm-1  | 12:51:23 - LiteLLM:ERROR: ollama_chat.py:498 - LiteLLM.ollama(): Exception occured - 'arguments'
litellm-1  | Traceback (most recent call last):
litellm-1  |   File "/usr/local/lib/python3.11/site-packages/litellm/llms/ollama_chat.py", line 484, in ollama_async_streaming
litellm-1  |     "arguments": json.dumps(function_call["arguments"]),
litellm-1  |                             ~~~~~~~~~~~~~^^^^^^^^^^^^^
litellm-1  | KeyError: 'arguments'
litellm-1  | 12:51:23 - LiteLLM:DEBUG: litellm_logging.py:757 - success callbacks: [<bound method Router.sync_deployment_callback_on_success of <litellm.router.Router object at 0x7f466633c590>>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f46657e1410>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f4666266d90>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f4664f2e050>, <litellm._service_logger.ServiceLogging object at 0x7f4664d5ad50>]
litellm-1  | 12:51:23 - LiteLLM:DEBUG: utils.py:244 - Logging Details LiteLLM-Async Success Call, cache_hit=False

Twitter / LinkedIn details

No response

Clad3815 commented 1 month ago

@krrishdholakia Also this error happen with the new version:


litellm-1  | 17:21:44 - LiteLLM:ERROR: ollama_chat.py:519 - LiteLLM.ollama(): Exception occured - sequence item 19: expected str instance, NoneType found
litellm-1  | Traceback (most recent call last):
litellm-1  |   File "/usr/local/lib/python3.11/site-packages/litellm/llms/ollama_chat.py", line 493, in ollama_async_streaming
litellm-1  |     response_content = first_chunk_content + "".join(content_chunks)
litellm-1  |                                              ^^^^^^^^^^^^^^^^^^^^^^^
litellm-1  | TypeError: sequence item 19: expected str instance, NoneType found

Reproducible code:

const OpenAI = require("openai");

const openai = new OpenAI({
    apiKey: "sk-1234",
    baseURL: "http://localhost:4444/v1",
});
const tools = [
    {
        type: "function",
        function: {
            name: "get_delivery_date",
            description: "Get the delivery date for a customer's order. Call this whenever you need to know the delivery date, for example when a customer asks 'Where is my package'",
            parameters: {
                type: "object",
                properties: {
                    order_id: {
                        type: "string",
                        description: "The customer's order ID.",
                    },
                },
                required: ["order_id"],
                additionalProperties: false,
            },
        }
    }
];

const messages = [
    { role: "system", content: "You are a helpful customer support assistant. Use the supplied tools to assist the user." },
    { role: "user", content: "Hi, can you tell me the delivery date for my order? My order ID is 1234567890." }
];

async function main() {
    const stream = await openai.beta.chat.completions.stream({
        model: 'ollama_chat/llama3.2',
        messages: messages,
        tools: tools,
        stream: true,
        api_base: "http://host.docker.internal:11434" // Override LiteLLM config (Optional)
    });

    for await (const chunk of stream) {
        process.stdout.write(chunk.choices[0]?.delta?.content || '');
    }

    const chatCompletion = await stream.finalChatCompletion();
    console.log(chatCompletion);
}

main();

zengbo commented 1 month ago

I got the same error

BerriAI / litellm

[Bug]: Tools streaming broken with ollama_chat #6135

What happened?

Relevant log output

Twitter / LinkedIn details