Closed kael-aiur closed 12 months ago
Can you provide the code? I want to confirm whether it's not about vectorization or the time-consuming process of inserting text data into the vector database.
yes, It is not vectorization, It like async api in python, I'm try to defined in my fork project, and now without any implement for llm client, I just using in my private llm server, and implement by my private llm client for preview, I can commit a pull request to this project.
Good, looking forward to your pull request.
great job
supported, see StreamOpenAIExample
@RunnableExample
public class StreamOpenAIExample {
public static void main(String[] args) {
var llm = OpenAI.builder()
.maxTokens(1000)
.temperature(0)
.requestTimeout(120)
.build()
.init();
var result = llm.asyncPredict("Introduce West Lake in Hangzhou, China.");
result.doOnNext(System.out::print).blockLast();
}
}
I am using RetrievalQa chain to build a document-based conversational tool, but every time I ask a question about the content of the document, I have to wait for the large language model to complete the entire answer. Sometimes it takes a long time, and I am not sure if there is an error. Therefore, I am considering whether we can support a streaming response interface for this conversational chain.