Closed wongjingping closed 1 month ago
Ooh very cool! I'll also try and use this for some post-processing for sql-eval. Will be quite helpful to see results broken down by SQL features :D
Late to the game, but this is very cool indeed to break down correctness by features!
Now that we have the sql featurization code in defog_utils, we can import it in to analyse our past runs! We featurize the expected sql in
auto_error_analysis.ipynb
and output some summaries. Currently this only adds the univariate sums/correlations with the total correct sum, which isn't particularly useful. The next step is find groups of features that contribute to a high sum. For example, if sql_date_part + sql_case_condition + sql_joins > 1 fail the most times, then we can see if our training data contains such a mix and generate + mix more of such data if insufficient.