defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
485 stars 52 forks source link

Added a bedrock runner to make it easier to run models straight from bedrock #116

Closed rishsriv closed 2 months ago

rishsriv commented 2 months ago

To run this, we can go

python -W ignore main.py \
  -db postgres \
  -q "data/questions_gen_postgres.csv" \
  -o results/llama3_70b.csv \
  -g bedrock \
  -f prompts/prompt.md \
  -m meta.llama3-70b-instruct-v1:0 \
  -p 5

Correct so far: 147/200 (73.50%): 100%|███████████████████████████████████████████████████| 200/200 [02:42<00:00,  1.23it/s]
                 correct  error_db_exec
query_category                         
date_functions  0.680000       0.200000
group_by        0.828571       0.000000
instruct        0.771429       0.114286
order_by        0.885714       0.000000
ratio           0.342857       0.114286
table_join      0.885714       0.000000