This PR updates the text-to-SQL rules to enforce table alias qualification for column names, improving query clarity and preventing ambiguity in generated SQL queries.
Changes
Key updates to the SQL generation rules:
Added rule: "ALWAYS QUALIFY column names with their table name or table alias"
Modified a few example to mention the matter
Before:
SELECT SUM(PriceSum) FROM Revenue
WHERE CAST(PurchaseTimestamp AS TIMESTAMP WITH TIME ZONE) >= ...
After:
SELECT SUM(r.PriceSum) FROM Revenue r
WHERE CAST(r.PurchaseTimestamp AS TIMESTAMP WITH TIME ZONE) >= ...
Summary
After the prompt optimization, the sql will use table or alias to avoid column ambiguity, and we could observe the issue whether this issue occurs again.
SELECT "olist_customers_dataset"."customer_city" AS "city", "olist_orders_dataset"."order_id", SUM("olist_order_items_dataset"."price") AS "total_value" FROM "olist_orders_dataset" JOIN "olist_order_items_dataset" ON "olist_orders_dataset"."order_id" = "olist_order_items_dataset"."order_id" JOIN "olist_customers_dataset" ON "olist_orders_dataset"."customer_id" = "olist_customers_dataset"."customer_id" GROUP BY "olist_customers_dataset"."customer_city", "olist_orders_dataset"."order_id" ORDER BY "city", "total_value" DESC LIMIT 3
This PR updates the text-to-SQL rules to enforce table alias qualification for column names, improving query clarity and preventing ambiguity in generated SQL queries.
Changes
Key updates to the SQL generation rules:
Before:
After:
Summary
After the prompt optimization, the sql will use table or alias to avoid column ambiguity, and we could observe the issue whether this issue occurs again.