Closed wendy-aw closed 2 months ago
Woot! Thank you! Generally looks good – but just one point.
Wouldn't it be better for us to upload already translated question files for different dialects? That way, users will face much less complexity when using this for non postgres/redshift/snowflake dialects!
So essentially, maybe we can use the script and upload the files once (and save everyone else compute and LLM tokens :D) – while leaving these scripts as they are so users that want to modify the queries or add their own can use them?
Thanks for the extra additions!
Added translate_sql_dialect.py that takes in a csv file from the
data/
folder and translates it into BigQuery, MySQL, T-SQL or SQLite. This will add one .sql file per data file per dialect into the same folder.defog-data
repo.As the
questions_gen_postgres
file has multiple correct SQL options in thequery
column, the script accommodates this by translating multiple SQLs per row. However, the translated query is not guaranteed to have the same dataframe result as in the original dialect.The actual .sql files for these different dialects will come in the next PR after manual verification. Once these are in, evals can be performed in these various dialects.
Minor change to original
questions_gen_postgres.csv
: removed schema prefixMinor change in
eval/eval.py
: fill NA vals in the df with -99999 to allow for comparison. Previously it would error out.