I would like to support a new argument --skip-rag. It instructs the api server to skip RAG search if either the system prompt or user last prompt exceeds certain length.
It defaults to 0, which means RAG search is ALWAYS performed. If it has a value, say --skip-rag 512, then the api server will skip the RAG search if the input prompt is above 512 tokens.
The reason for this is that a long input prompt probably already includes its own context. We should NOT try to replace it. It allows the RAG api server to work with client side RAG solutions, and only supplement the context when the client RAG provides no context.
I would like to support a new argument
--skip-rag
. It instructs the api server to skip RAG search if either the system prompt or user last prompt exceeds certain length.It defaults to 0, which means RAG search is ALWAYS performed. If it has a value, say
--skip-rag 512
, then the api server will skip the RAG search if the input prompt is above 512 tokens.The reason for this is that a long input prompt probably already includes its own context. We should NOT try to replace it. It allows the RAG api server to work with client side RAG solutions, and only supplement the context when the client RAG provides no context.