Can we stream responses?

mneedham commented 1 month ago

Describe the bug

Not sure if this is a bug or if it's not supposed to work this way, but I can't figure out how to stream the response from the LLM.

To Reproduce

from lightrag.core.generator import Generator
from lightrag.components.model_client import OllamaClient

model_client = OllamaClient()
model_kwargs = {"model": "phi3", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
generator({"input_str": "What is the capital of France?"})

Returns:

GeneratorOutput(
    data=None,
    error='Error parsing the completion: <generator object Client._stream at 0x11e388480>',
    usage=None,
    raw_response='<generator object Client._stream at 0x11e388480>',
    metadata=None
)

Expected behavior

I want to be able to iterate over the response and render it as it's produced.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Mac OS Sonoma 14.5

liyin2015 commented 1 month ago

@mneedham I need to add the stream on the model client, let me try to add it

liyin2015 commented 1 month ago

@mneedham its updated, if you update pip to 0.1.0.b5 you should be able to stream the output

output = Generator()
for chunk in output.data:
    print(chunk)

mneedham commented 1 month ago

Awesome - it works :D Thanks!

mneedham commented 1 month ago

I am testing it out with my usual ridiculous prompt!

model_client = OllamaClient(host="http://localhost:11434")
model_kwargs = {"model": "llama3.1", "stream": True}
generator = Generator(model_client=model_client, model_kwargs=model_kwargs)
output = generator({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
for chunk in output.data:
  print(chunk, end='', flush=True)

What an interesting scenario!

If a lion and an elephant met three dogs and four hyenas, I think it's likely that the outcome would be quite dramatic.

Firstly, the lion would probably take charge of the situation, being the apex predator in the savannah. The elephants, being gentle giants, might try to stay calm and avoid any confrontation.

However, the presence of the three dogs could potentially cause a commotion. They might bark excitedly at the sight of the big cats, which could distract the lion and give the elephant an opportunity to intervene.

The four hyenas, on the other hand, would likely be more interested in scavenging for food than engaging in a full-blown battle. They might try to sneak up behind the dogs and steal their scraps, or even attempt to steal some of the elephant's food.

But if all else fails, I imagine the lion would assert its dominance by chasing after one of the smaller animals (perhaps the dogs?) to show who's boss. The elephant, being a gentle giant, might try to calm everyone down by using its size and presence to intimidate the hyenas into backing off.

Of course, this is all just hypothetical – in reality, each animal would behave according to their natural instincts and survival strategies! What do you think?

mneedham commented 1 month ago

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

liyin2015 commented 1 month ago

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

mneedham commented 1 month ago

Do we need some sort of await in parse_stream_response to handle the acall function?

output = await generator.acall({"input_str": "What would happen if a lion and an elephant met three dogs and four hyenas?"})
Error parsing the completion <async_generator object AsyncClient._stream.<locals>.inner at 0x10c9cac20>: argument of type 'async_generator' is not iterable

def parse_stream_response(completion: GeneratorType) -> Any:
    """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
    for chunk in completion:
        log.debug(f"Raw chunk: {chunk}")
        yield chunk["response"] if "response" in chunk else None

mneedham commented 1 month ago

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

I am using 0.1.0.b6!

liyin2015 commented 1 month ago

It's kinda neat that this code also works if I set stream to False because then .data returns a string, which is also iterable :)

Its a bug I introduced in the 0.1.0.b5. Please upgrade to .6 and it should work fine! (It werent supposed to change the normal non-stream behavior) 😆

I am using 0.1.0.b6!

Good, then there is no bug.

mneedham commented 1 month ago

@liyin2015 does this function also need to check for AsyncGenerator to have it work with the acall function?

    def parse_chat_completion(
        self, completion: Union[GenerateResponse, GeneratorType]
    ) -> Any:
        """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
        log.debug(f"completion: {completion}, {isinstance(completion, GeneratorType)}")
        if isinstance(completion, GeneratorType):  # streaming
            return parse_stream_response(completion)
        else:
            return parse_generate_response(completion)

At the moment I get this error when using acall with stream:True:

Error parsing the completion <async_generator object AsyncClient._stream.<locals>.inner at 0x124d731c0>: argument of type 'async_generator' is not iterable

mneedham commented 1 month ago

@liyin2015 I tried a fix here, but I only did it for Ollama Client so far

https://github.com/SylphAI-Inc/AdalFlow/pull/158

SylphAI-Inc / AdalFlow

Can we stream responses? #149