Canner / WrenAI

🚀 An open-source SQL AI (Text-to-SQL) Agent that empowers data, product teams to chat with their data. 🤘
https://getwren.ai/oss
GNU Affero General Public License v3.0
2.08k stars 221 forks source link

feat(wren-ai-service): prompting to avoid column ambiguity #906

Closed paopa closed 2 weeks ago

paopa commented 2 weeks ago

This PR updates the text-to-SQL rules to enforce table alias qualification for column names, improving query clarity and preventing ambiguity in generated SQL queries.

Changes

Key updates to the SQL generation rules:

Before:

SELECT SUM(PriceSum) FROM Revenue 
WHERE CAST(PurchaseTimestamp AS TIMESTAMP WITH TIME ZONE) >= ...

After:

SELECT SUM(r.PriceSum) FROM Revenue r 
WHERE CAST(r.PurchaseTimestamp AS TIMESTAMP WITH TIME ZONE) >= ...

Summary

After the prompt optimization, the sql will use table or alias to avoid column ambiguity, and we could observe the issue whether this issue occurs again.

SELECT "olist_customers_dataset"."customer_city" AS "city", "olist_orders_dataset"."order_id", SUM("olist_order_items_dataset"."price") AS "total_value" FROM "olist_orders_dataset" JOIN "olist_order_items_dataset" ON "olist_orders_dataset"."order_id" = "olist_order_items_dataset"."order_id" JOIN "olist_customers_dataset" ON "olist_orders_dataset"."customer_id" = "olist_customers_dataset"."customer_id" GROUP BY "olist_customers_dataset"."customer_city", "olist_orders_dataset"."order_id" ORDER BY "city", "total_value" DESC LIMIT 3