chore(wren-ai-service): retrieval improvement - Githubissues

Canner / WrenAI

🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘

https://getwren.ai/oss

GNU Affero General Public License v3.0

2.04k stars 211 forks source link

chore(wren-ai-service): retrieval improvement #599

Closed cyyeh closed 2 months ago

cyyeh commented 3 months ago

indexing pipeline:

3 collections: db_schema, table_descriptions, view_questions
to solve llm token window limit issue for indexing, we have a new env called COLUMN_INDEXING_BATCH_SIZE which users can decide how many columns to index in one document at one time

retrieval pipeline:

select top 10(TABLE_RETRIEVAL_SIZE) tables based on table name and table descriptions (table_descriptions collection)
select top 1000(TABLE_COLUMN_RETRIEVAL_SIZE) tables and columns based on previous results (db_schma)
use llm to choose which tables and columns are needed to answer the question

we also expose two env vars for table and column selection: TABLE_RETRIEVAL_SIZE and TABLE_COLUMN_RETRIEVAL_SIZE