When using a WandbCallbackHandler, streaming generated responses doesn't work for a QueryEngine. Works without a WandbCallbackHandler or when not streaming.
Version
10.16
Steps to Reproduce
Set up a VectorStoreIndex that uses a WandbCallbackHandler, change it to a QueryEngine and try to stream a response from it as directed here.
Relevant Logs/Tracbacks
import os
import wandb
from llama_index.callbacks.wandb import WandbCallbackHandler
from dotenv import load_dotenv
from llama_index.core.callbacks import (
CallbackManager,
LlamaDebugHandler,
CBEventType,
)
load_dotenv()
wandb.login(
key=os.environ["WANDB_API_KEY"],
host=os.environ["WANDB_HOST"],
)
wandbhandler = WandbCallbackHandler(run_args={"project": XXX})
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([
llama_debug,
wandbhandler,
])
from llama_index.llms.azure_openai import AzureOpenAI
llm = AzureOpenAI(
XXX...,
callback_manager=callback_manager,
)
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
embeddings = AzureOpenAIEmbedding(
XXX...,
callback_manager=callback_manager,
)
from llama_index.core import Settings
Settings.llm = llm
Settings.embed_model = embeddings
Settings.chunk_size = 512
Settings.callback_manager = callback_manager
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(
docs, callback_manager=callback_manager
)
query_engine = index.as_query_engine(streaming=True, similarity_top_k=5)
streaming_response = query_engine.query("Foo?")
for response in streaming_response.response_gen:
print(response)
Bug Description
When using a WandbCallbackHandler, streaming generated responses doesn't work for a QueryEngine. Works without a WandbCallbackHandler or when not streaming.
Version
10.16
Steps to Reproduce
Set up a VectorStoreIndex that uses a WandbCallbackHandler, change it to a QueryEngine and try to stream a response from it as directed here.
Relevant Logs/Tracbacks