The next level of QA with your dataset cannot be done by embedding all of the data/semantics in a prompt. It has to be iterative and should only fetch relevant information from a retriever that allows for a natural language querying of additional context (data, metadata about the SQL schema, design concepts). This would lead to far more accurate responses from the system.
This would also lead to not embedding 2k lines of SQL schema as part of the prompt, reducing the need for a larger context window, and leading to cost reductions and an increase in speed.
The current task includes building a Colab notebook for
[ ] POC for DSP as a "multi-hop" question answering tool. The foundational model should ask a question if the information provided is incomplete and should try the SQL queries and see the feedback itself before calling that an answer. Iterate until the solution is correct.
[ ] Connect to a small database
[ ] Implement a scenario where the context provided in the first prompt is not enough to be able to answer the question accurately. This would lead to FM asking for more context in plain text and retriever replying with relevant details.
[ ] Test the query at the end of the response and if it is incorrect, feed the answer back to FM to look for alternative ways or gaps in information to answer that user query.
[ ] Allow for embedding a context through a SQL Schema. This would be through SQL doc strings for the specified flavor. Context could be in the following ways,
[ ] Why a field is named in a certain way?
[ ] Any enums for certain fields.
[ ] Contextual data for a table. Design docs as an extension of the SQL Schema.
[ ] Implement a multi-shot context retriever that retrieves what is stored in the above context. Should allow for doing small queries from the DB to get additional metadata, and allow for intermediate data retrieval for multi-step queries.
The next level of QA with your dataset cannot be done by embedding all of the data/semantics in a prompt. It has to be iterative and should only fetch relevant information from a retriever that allows for a natural language querying of additional context (data, metadata about the SQL schema, design concepts). This would lead to far more accurate responses from the system.
This would also lead to not embedding 2k lines of SQL schema as part of the prompt, reducing the need for a larger context window, and leading to cost reductions and an increase in speed.
The current task includes building a Colab notebook for