defog-ai / sql-eval

Evaluate the accuracy of LLM generated outputs
Apache License 2.0
448 stars 47 forks source link

Add sql featurization into auto_error_analysis.ipynb summaries #158

Closed wongjingping closed 1 month ago

wongjingping commented 1 month ago

Now that we have the sql featurization code in defog_utils, we can import it in to analyse our past runs! We featurize the expected sql in auto_error_analysis.ipynb and output some summaries. Currently this only adds the univariate sums/correlations with the total correct sum, which isn't particularly useful. The next step is find groups of features that contribute to a high sum. For example, if sql_date_part + sql_case_condition + sql_joins > 1 fail the most times, then we can see if our training data contains such a mix and generate + mix more of such data if insufficient.

rishsriv commented 1 month ago

Ooh very cool! I'll also try and use this for some post-processing for sql-eval. Will be quite helpful to see results broken down by SQL features :D

wendy-aw commented 1 month ago

Late to the game, but this is very cool indeed to break down correctness by features!