LlamaEdge / rag-api-server

A RAG API server written in Rust following OpenAI specs
https://llamaedge.com/docs/user-guide/server-side-rag/quick-start
Apache License 2.0
21 stars 7 forks source link

Skip server side RAG when the request already have context #19

Closed juntao closed 1 month ago

juntao commented 2 months ago

I would like to support a new argument --skip-rag. It instructs the api server to skip RAG search if either the system prompt or user last prompt exceeds certain length.

It defaults to 0, which means RAG search is ALWAYS performed. If it has a value, say --skip-rag 512, then the api server will skip the RAG search if the input prompt is above 512 tokens.

The reason for this is that a long input prompt probably already includes its own context. We should NOT try to replace it. It allows the RAG api server to work with client side RAG solutions, and only supplement the context when the client RAG provides no context.