coexplain / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
0 stars 0 forks source link

[Bug]: WandbCallbackHandler causes streaming response to be empty #2

Open williamhub opened 3 months ago

williamhub commented 3 months ago

Bug Description

When using a WandbCallbackHandler, streaming generated responses doesn't work for a QueryEngine. Works without a WandbCallbackHandler or when not streaming.

Version

10.16

Steps to Reproduce

Set up a VectorStoreIndex that uses a WandbCallbackHandler, change it to a QueryEngine and try to stream a response from it as directed here.

Relevant Logs/Tracbacks

import os
import wandb
from llama_index.callbacks.wandb import WandbCallbackHandler
from dotenv import load_dotenv
from llama_index.core.callbacks import (
        CallbackManager,
        LlamaDebugHandler,
        CBEventType,
    )

load_dotenv()

wandb.login(
    key=os.environ["WANDB_API_KEY"],
    host=os.environ["WANDB_HOST"],
)

wandbhandler = WandbCallbackHandler(run_args={"project": XXX})
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([
    llama_debug,
    wandbhandler,
    ])

from llama_index.llms.azure_openai import AzureOpenAI

llm = AzureOpenAI(
    XXX...,
    callback_manager=callback_manager,
)

from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

embeddings = AzureOpenAIEmbedding(
    XXX...,
    callback_manager=callback_manager,
)

from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embeddings
Settings.chunk_size = 512
Settings.callback_manager = callback_manager

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    docs, callback_manager=callback_manager
)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=5)

streaming_response = query_engine.query("Foo?")
for response in streaming_response.response_gen:
    print(response)