"Measures both SQL complexity (think nested queries and multiple joins) and semantic diversity (uses language in different settings)"
When looking through the code, I fail to see a metric being computed capturing this. I was expecting some kind of Model-Graded Eval using an LLM (or similar) to determine the complexity of the SQL query itself or something like this. Perhaps it is something using this:
https://github.com/defog-ai/sql-eval/blob/main/eval/eval.py#L114
Right now, I only see metrics evaluating based on if the SQL query is valid, the runtime of completion, and what the resulting SQL Records contain (pandas.DataFrame). Am I missing something?
From this blog post: https://defog.ai/blog/open-sourcing-sqleval/
I saw this sentence:
When looking through the code, I fail to see a metric being computed capturing this. I was expecting some kind of Model-Graded Eval using an LLM (or similar) to determine the complexity of the SQL query itself or something like this. Perhaps it is something using this: https://github.com/defog-ai/sql-eval/blob/main/eval/eval.py#L114
Right now, I only see metrics evaluating based on if the SQL query is valid, the runtime of completion, and what the resulting SQL Records contain (
pandas.DataFrame
). Am I missing something?