NVIDIA / ChatRTX

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Other
2.66k stars 305 forks source link

Unable to stream results. TypeError: 'NoneType' object is not iterable #31

Open qbm5 opened 6 months ago

qbm5 commented 6 months ago

I am attempting to build a chatbot using TrtLlmAPI as the llm

llm = TrtLlmAPI(    
    model_path=trt_engine_path,
    engine_name=trt_engine_name,
    tokenizer_dir=tokenizer_dir_path,
    temperature=0.1,

    max_new_tokens=1024,
    context_window=1024 * 4,
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=False,
)
...
          documents = SimpleDirectoryReader(self.data_dir, recursive=True, required_exts=exts).load_data()
            faiss_index = faiss.IndexFlatL2(self.d)
            vector_store = FaissVectorStore(faiss_index=faiss_index)
            storage_context = StorageContext.from_defaults(vector_store=vector_store)
            index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)
            index.storage_context.persist(persist_dir = storage_path) 
            return index

and a query retrieve engine to perform the query

return self.index.as_query_engine(
            streaming=True,
            similarity_top_k = 4,
        )

I can successfully execute the query when waiting for the full response, but once I enable the streaming flag it just starts throwing exceptions

response = query_engine.query(compiledQuery)  
        for token in response.response_gen:
            print(token)`

TypeError: 'NoneType' object is not iterable

I have a tried a number of different ways to get streaming to work, and from what I can see in the RTX Chat codebase, this is what they are doing, but it is not working for me, with the above error

anujj commented 3 months ago

yes .. you i identified the correct code to do the streaming. Make sure that the UI framework is also expecting the yield response