to solve llm token window limit issue for indexing, we have a new env called COLUMN_INDEXING_BATCH_SIZE which users can decide how many columns to index in one document at one time
retrieval pipeline:
select top 10(TABLE_RETRIEVAL_SIZE) tables based on table name and table descriptions (table_descriptions collection)
select top 1000(TABLE_COLUMN_RETRIEVAL_SIZE) tables and columns based on previous results (db_schma)
use llm to choose which tables and columns are needed to answer the question
we also expose two env vars for table and column selection: TABLE_RETRIEVAL_SIZE and TABLE_COLUMN_RETRIEVAL_SIZE
indexing pipeline:
COLUMN_INDEXING_BATCH_SIZE
which users can decide how many columns to index in one document at one timeretrieval pipeline:
TABLE_RETRIEVAL_SIZE
) tables based on table name and table descriptions (table_descriptions collection)TABLE_COLUMN_RETRIEVAL_SIZE
) tables and columns based on previous results (db_schma)we also expose two env vars for table and column selection:
TABLE_RETRIEVAL_SIZE
andTABLE_COLUMN_RETRIEVAL_SIZE