Added support for multiple question files in 1 evaluation for all runners. This allows us to generate completions for all questions with just 1x setup in the runner (instead of nx setup for each call to sql-eval/individual runners). Checks are now made in main.py, with each runner then iterating through the list of question/prompt/output file combinations. Supported combinations include:
1 question files, 1 prompt file, 1 output files
n question files, 1 prompt file, n output files
1 question files, n prompt file, n output files
n question files, n prompt file, n output files
Increased max_tokens for vllm to 600.
Tested on vllm and runner:
Preparing /models/combined/sqlcoder_7b_bf16_r128_ds_002_750_b20/checkpoint-700
2024-04-17 09:37:09,387 INFO worker.py:1724 -- Started a local Ray instance.
INFO 04-17 09:37:10 llm_engine.py:72]
...
Using prompt file prompts/prompt.md
Preparing questions...
Using all question(s) from data/instruct_basic_postgres.csv
Prepared 40 question(s) from data/instruct_basic_postgres.csv
Generating completions
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:47<00:00, 1.19s/it]
Time taken: 47.6s
Correct so far: 30/40 (75.00%): 100%|█████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 53.59it/s]
exact_match correct
query_category
basic_group_order_limit 0.875 0.875
basic_join_date_group_order_limit 0.625 0.625
basic_join_distinct 0.750 0.750
basic_join_group_order_limit 0.500 0.500
basic_left_join 1.000 1.000
Average tokens generated: 65.7
Saved results to results/sqlcoder_7b_bf16_r128_ds_002_750_b20_c700_basic.csv
Using prompt file prompts/prompt.md
Preparing questions...
Using all question(s) from data/instruct_advanced_postgres.csv
Prepared 64 question(s) from data/instruct_advanced_postgres.csv
Generating completions
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [01:53<00:00, 1.77s/it]
Time taken: 113.7s
Correct so far: 33/64 (51.56%): 100%|█████████████████████████████████████████████████████████████████████████████████| 64/64 [00:01<00:00, 51.08it/s]
exact_match correct
query_category
instructions_cte_join 0.375 0.375
instructions_cte_window 0.250 0.250
instructions_date_join 0.625 0.625
instructions_string_matching 0.875 0.875
keywords_aggregate 0.625 0.625
keywords_ratio 0.375 0.375
Average tokens generated: 98.0
Saved results to results/sqlcoder_7b_bf16_r128_ds_002_750_b20_c700_advanced.csv
Using prompt file prompts/prompt.md
Preparing questions...
Using all question(s) from data/questions_gen_postgres.csv
Prepared 200 question(s) from data/questions_gen_postgres.csv
Generating completions
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [03:47<00:00, 1.14s/it]
Time taken: 228.2s
Correct so far: 165/200 (82.50%): 100%|██████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:04<00:00, 43.14it/s]
exact_match correct
query_category
date_functions 0.640000 0.640000
group_by 0.857143 0.857143
instruct 0.857143 0.857143
order_by 0.857143 0.971429
ratio 0.714286 0.828571
table_join 0.685714 0.742857
Average tokens generated: 50.7
Saved results to results/sqlcoder_7b_bf16_r128_ds_002_750_b20_c700_v1.csv
main.py
, with each runner then iterating through the list of question/prompt/output file combinations. Supported combinations include:Tested on vllm and runner:
openai: