defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
448 stars 47 forks source link

Dynamically add table aliases without an LLM + de-duplicate columns in pandas #155

Closed rishsriv closed 1 month ago

rishsriv commented 1 month ago

This automatically generates relevant table aliases and appends it to a prompt. Doing so transfers the onus of creating table aliases away from the LLM. We may have to retrain our LLM to expect this kind of prompting, so that it expects a more varied source of inputs.

Here's an example of how to run this.

python main.py \
-db postgres \
-q "data/questions_gen_postgres.csv" "data/instruct_basic_postgres.csv" "data/instruct_advanced_postgres.csv" "data/idk.csv" \
-o results/classic_new_reprompt.csv results/basic_new_reprompt.csv results/advanced_new_reprompt.csv results/idk_new_reprompt.csv \
-g api \
-b 1 \
-f prompts/prompt_cot.md \
--api_url "YOUR_API_ENDPOINT" \
--api_type "vllm" \
-p 10 \
-c 0 --logprobs --cot_table_alias
rishsriv commented 1 month ago

Fixed! We can now use --cot_table_alias instruct to get the model to use {cot_instructions}, and --cot_table_alias pregen to pre-generate table aliases