Closed khaledalarja closed 11 months ago
Hi @khaledalarja, the endpoint creation in the examples is an abstraction to create a /chat
endpoint for basic use cases. for more complex use cases such as yours, I recommend checking out this example: https://github.com/menloparklab/chatbot-ui/blob/d2d4aa84ebb6351bfacee99429228d091221b60b/backend/app.py#L101-L114
The idea is to use the router
instance to manually create a new endpoint. The advantage of using the LangchainRouter
over FastAPI's APIRouter
is that you get the LLM caching built-in.
You will need to create your own pydantic models for request and response body. You can use the above example for the same.
Hi @ajndkr Thanks so much for your reply !
I have managed to have some streaming using the example that you gave me, but there is something strange that I noticed, When I use gpt-3.5-turbo, the streaming is handled sentence by sentence, and when I choose gpt-4 here I lose all the streaming and I will have to wait for the response to complete!
Here is the modified code:
class Prompt(BaseModel):
query: str
authorized_codes: Optional[List[constr(regex=r"^\d{2}[A-Z]{3,5}\d{2,5}")]] = None
temperature: float = 0.0
max_tokens: int = 1000
k: int = 6
model_type: str = "gpt-4"
def create_chain(user_input: Prompt) -> AsyncIterable[str]:
system_template = """Use the following pieces of context to answer the user's question.
If you don't know the answer, just say that "I don't know", don't try to make up an answer.
If the context is empty, just say that "I don't know", but always reply in the same language of the user.
----------------
context:
{summaries}"""
messages = [
SystemMessagePromptTemplate.from_template(system_template),
HumanMessagePromptTemplate.from_template(user_input.query),
]
prompt = ChatPromptTemplate.from_messages(messages)
llm = ChatOpenAI(
model_name=user_input.model_type,
temperature=user_input.temperature,
max_tokens=user_input.max_tokens,
streaming=True,
verbose=True,
)
chain = RetrievalQAWithSourcesChain.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=langchain_db_client.as_retriever(),
return_source_documents=True,
chain_type_kwargs={"prompt": prompt}
)
return chain
@app.post("/query")
async def query(user_input: Prompt):
chain = create_chain(user_input)
return StreamingResponse.from_chain(chain, user_input.query)
I believe that the callback method waits a new line in the response to stream. when I choose gpt-3.5 it's the response that contains a numbered list, while with gpt-4 I have no newline characters that's why it's waiting for all the response to finish.
Any idea how to stream token by token like normal instead of waiting for newline characters?
hmm. this is a weird error. both gpt-3.5 and gpt-4 use the same callback. So it's likely that the streaming is affected by openai API and not this library. can you try again with a different chain or different system prompt?
closing issue due to user inactivity.
You can check out the new documentation for v0.8: https://lanarky.ajndkr.com/learn/adapters/langchain/
please reopen this issue if you'd like to discuss more.
If we take the example that is in the docs:
I want to be able to pass to the retriever in the chain the search_kwargs so it does some filtering,
But that should be based on the query of the input, for example, we might have another attribute in the input like a list of authorized_documents_codes so that we can pass this list to the retriever and it can filter the documents in the search.
How to do that with Lanarky?
Here is the original code without streaming to have an idea of what I'm talking about that I used for filtering and the goal is to make it in streaming but with filtering for the retriever:
The problem with Lanarky is that the chain should be passed at the creation of the router, so that makes it not dynamic and susceptible for modifications..