This PR seeks to solve several issues with the main branch as well as bringing new features to our guidance agent.
Main features:
Only one model necessary for the QA. Falling back on BrainChulo main vector store, we eliminate the necessity to use langchain/second model to retrieve documents. The possibility remains under the TEST_MODE option in the .env - note that it also implies populating the TEST_FILE variable.
Restored conversational ability for the QA guidance agent. The guidance program has been updated to distinguish between conversation (phatic) and fact-seeking (referential) user queries. Conversation history is reintroduced when a phatic query is detected to allow for discussion continuity. Only referential queries are grounded within the database.
Improved data retrieval. I'm trying to ground the guidance program within a very constrained yet flexible flow. Using a very basic cognitive psychology approach (hello Dr Calvin haha) helps to anchor the model on detailed yet concise steps.
Some limitations:
This process is, for now, optimized for Guanaco 33B using ggml q5 quantization method. I know it's not within the reach of most users, but I chose to first focus on getting excellent results, then scaling down and getting a "production ready" process.
Consequence of 1: the process is slow right now. Even running it on 2 3090s helped by a 7950x, count more or less of minute for a question that necessitates data retrieval from the context. It gets way faster for phatic queries, or the ones flagged as unanswerable if you use the ethics mode.
Queries interrogating the agent itself (ex:"What's your favorite color?") might sometimes be flagged as referential. This starts the data retrieval process, but will unvariably leads to "I'm sorry, but I don't have sufficient information to provide an answer to this question." unless your data contains information pertaining to the question.
Data retrieval performance is still dependent, for now, on the quality and form of the query. "What's the address of the office?" might succeed and "What is the address of the office?" might fail. Don't hesitate to reformulate if your first query fails.
Next steps:
Further improvements to the data retrieval process are still a priority. Increasing speed and reliability while lowering hardware requirements to a 13B model is what I hope to bring in the coming days.
This PR seeks to solve several issues with the main branch as well as bringing new features to our guidance agent.
Main features:
Only one model necessary for the QA. Falling back on BrainChulo main vector store, we eliminate the necessity to use langchain/second model to retrieve documents. The possibility remains under the TEST_MODE option in the .env - note that it also implies populating the TEST_FILE variable.
Restored conversational ability for the QA guidance agent. The guidance program has been updated to distinguish between conversation (phatic) and fact-seeking (referential) user queries. Conversation history is reintroduced when a phatic query is detected to allow for discussion continuity. Only referential queries are grounded within the database.
Improved data retrieval. I'm trying to ground the guidance program within a very constrained yet flexible flow. Using a very basic cognitive psychology approach (hello Dr Calvin haha) helps to anchor the model on detailed yet concise steps.
Some limitations:
This process is, for now, optimized for Guanaco 33B using ggml q5 quantization method. I know it's not within the reach of most users, but I chose to first focus on getting excellent results, then scaling down and getting a "production ready" process.
Consequence of 1: the process is slow right now. Even running it on 2 3090s helped by a 7950x, count more or less of minute for a question that necessitates data retrieval from the context. It gets way faster for phatic queries, or the ones flagged as unanswerable if you use the ethics mode.
Queries interrogating the agent itself (ex:"What's your favorite color?") might sometimes be flagged as referential. This starts the data retrieval process, but will unvariably leads to "I'm sorry, but I don't have sufficient information to provide an answer to this question." unless your data contains information pertaining to the question.
Data retrieval performance is still dependent, for now, on the quality and form of the query. "What's the address of the office?" might succeed and "What is the address of the office?" might fail. Don't hesitate to reformulate if your first query fails.
Next steps: