issues
search
defog-ai
/
sql-eval
Evaluate the accuracy of LLM generated outputs
Apache License 2.0
448
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Small pesky bugs found while examining dialect exec errors
#154
wendy-aw
closed
1 month ago
2
Add CoT
#153
wongjingping
closed
1 month ago
1
SQLite SERIAL, drop valid/err_msg cols
#152
wendy-aw
closed
1 month ago
0
Expanded correct queries for 3 questions + minor prompt/typo fixes
#151
rishsriv
closed
1 month ago
1
Usability improvements
#150
wongjingping
closed
1 month ago
1
Dialect data files
#149
wendy-aw
closed
1 month ago
0
Dialect translation of eval files
#148
wendy-aw
closed
1 month ago
2
Added an option to add prompt to the logprobs
#147
rishsriv
closed
1 month ago
0
Added idk questions so we can query them more easily
#146
rishsriv
closed
1 month ago
0
Minor fixes to eval code
#145
rishsriv
closed
1 month ago
0
Update run_checkpoints.sh to use the vllm api server instead of offline inference
#144
rishsriv
closed
1 month ago
0
small changes to vllm runner
#143
wongjingping
closed
1 month ago
0
Added ability to get logprobs from vllm API results, and see them visualized in another repo
#142
rishsriv
closed
1 month ago
0
Api server updates
#141
wongjingping
closed
1 month ago
0
Standardize prompt formatting for vllm
#140
wongjingping
closed
1 month ago
0
Improve psycopg dialect string handling
#139
wongjingping
closed
1 month ago
1
Eval script fixes
#138
wongjingping
closed
1 month ago
0
Openai Tokenizer
#137
wongjingping
closed
1 month ago
0
Update requirements
#136
wongjingping
closed
1 month ago
0
any chance to have ms sql to be supported?
#135
ghostinside
opened
1 month ago
1
huggingface evaluation dataset not found
#134
minjunp
closed
1 month ago
1
New nb for automated error analysis
#133
wendy-aw
closed
2 months ago
0
gcs eval for checkpoint weights
#132
wongjingping
closed
2 months ago
1
Update API runner and prompt
#131
rishsriv
closed
2 months ago
0
Query and parsing fixes
#130
wongjingping
closed
2 months ago
0
Sql edits to basic_instruct and classic sql-eval
#129
wendy-aw
closed
2 months ago
0
Evaluation metrics for SQL Query not found
#128
andreped
opened
2 months ago
0
Question fixes
#127
wongjingping
closed
2 months ago
0
Sql edits to instruct_advanced
#126
wendy-aw
closed
2 months ago
1
Added install step to CONTRIBUTING.md; refactoring install step in README
#125
andreped
closed
2 months ago
0
Simplified if statement logic in compare_df in eval.py; minor refactoring
#124
andreped
closed
2 months ago
0
Added custom api server
#123
wongjingping
closed
2 months ago
1
Correct query in broker db
#122
wendy-aw
closed
2 months ago
0
Updated sql-eval questions
#121
wongjingping
closed
2 months ago
0
Fixed a typo in the CSV
#120
rishsriv
closed
2 months ago
0
Provide multiple correct answers for some questions
#119
rishsriv
closed
2 months ago
1
Enable tgi via api type
#118
wongjingping
closed
2 months ago
0
fix temperature/top_p warning
#117
wongjingping
closed
2 months ago
0
Added a bedrock runner to make it easier to run models straight from bedrock
#116
rishsriv
closed
2 months ago
0
Remove dependency on AutoTokenizer in api_runner
#115
rishsriv
closed
2 months ago
0
Update model runners
#114
wongjingping
closed
2 months ago
1
Continuous eval script
#113
wongjingping
closed
2 months ago
0
Add joinable columns as part of the metadata string with `c=0`
#112
rishsriv
closed
2 months ago
1
Added a batch size arg for hf runner
#111
rishsriv
closed
2 months ago
0
Fixed bug where "instructions" were being added to the prompt even if not specified
#110
rishsriv
closed
2 months ago
0
Add the questions file as a required parameter in the README files
#109
rishsriv
closed
2 months ago
1
Enable multiple question files
#108
wongjingping
closed
2 months ago
0
Basic question modifications
#107
wongjingping
closed
2 months ago
0
Enable argument for specifying how much to round floats to
#106
rishsriv
closed
2 months ago
0
Small bugfix for the gemini runner
#105
rishsriv
closed
3 months ago
0
Previous
Next