Closed Netzvamp closed 1 month ago
@Netzvamp use this to intercept calls - https://docs.litellm.ai/docs/proxy/call_hooks
Your error is caused because the chunk yielded by you is not in the expected format of a GenericStreamingChunk - https://github.com/BerriAI/litellm/blob/dd2ea72cb4f6106fc32cc7a56a6aa716ee14020e/litellm/types/utils.py#L82
Thank you, call hooks are great, i didn't know they existed!
Coincidentally, today I was trying the custom LLMs stuff and faced exactly this problem. For the records, I got this simple case (return unix epoch seconds) working, both completions and streaming.
Note that I don't know how correct that is (in fact I looked to tests to see how it was being done there). All I know is that it works and, as far as the documentation isn't showing any clear example with streaming, here it is:
import time
from typing import Iterator, AsyncIterator
from litellm.types.utils import GenericStreamingChunk, ModelResponse
from litellm import CustomLLM, completion, acompletion
class UnixTimeLLM(CustomLLM):
def completion(self, *args, **kwargs) -> ModelResponse:
return completion(
model="test/unixtime",
mock_response=str(int(time.time())),
) # type: ignore
async def acompletion(self, *args, **kwargs) -> ModelResponse:
return await acompletion(
model="test/unixtime",
mock_response=str(int(time.time())),
) # type: ignore
def streaming(self, *args, **kwargs) -> Iterator[GenericStreamingChunk]:
generic_streaming_chunk: GenericStreamingChunk = {
"finish_reason": "stop",
"index": 0,
"is_finished": True,
"text": str(int(time.time())),
"tool_use": None,
"usage": {"completion_tokens": 0, "prompt_tokens": 0, "total_tokens": 0},
}
return generic_streaming_chunk # type: ignore
async def astreaming(self, *args, **kwargs) -> AsyncIterator[GenericStreamingChunk]:
generic_streaming_chunk: GenericStreamingChunk = {
"finish_reason": "stop",
"index": 0,
"is_finished": True,
"text": str(int(time.time())),
"tool_use": None,
"usage": {"completion_tokens": 0, "prompt_tokens": 0, "total_tokens": 0},
}
yield generic_streaming_chunk # type: ignore
unixtime = UnixTimeLLM()
Ciao :-)
Thank you, call hooks are great, i didn't know they existed!
Hey @Netzvamp where in our docs would this have been helpful to see?
There is already a link on this site, so all fine, i'm blind ;) https://docs.litellm.ai/docs/providers/custom_llm_server
@stronk7 thanks for the code snippet. Added your example to docs + gave you a shoutout - https://docs.litellm.ai/docs/providers/custom_llm_server#add-streaming-support
What happened?
I'm trying to build a CustomLLM Proxy, so that i can intercept the LLM answers and do stuff with it. Without streaming it runs fine, but with async streaming i don't get it to stream the chunks. I don't found something about it in the docs. Here some code:
How can i correctly implement that?
Relevant log output
Twitter / LinkedIn details
No response