PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.79k stars 212 forks source link

append doesn't check column, column order matters #4724

Open cottrell opened 1 month ago

cottrell commented 1 month ago

What happened?

I'm not sure if this is intentional or not.

PRQL input

from invoices
group billing_country (
aggregate {
  a = count_distinct billing_city
  }
)
into A

from invoices
aggregate {
a = count_distinct billing_city
}
derive {
billing_country = '*'
}
select {a, billing_country}
into B

from A
append B

SQL output

WITH "A" AS (
  SELECT
    billing_country,
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
  GROUP BY
    billing_country
),
table_0 AS (
  SELECT
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
),
"B" AS (
  SELECT
    a,
    '*' AS billing_country
  FROM
    table_0
)
SELECT
  billing_country,
  a
FROM
  "A"
UNION
ALL
SELECT
  *
FROM
  "B"

-- Generated by PRQL compiler version:0.12.2 (https://prql-lang.org)

Expected SQL output

No response

MVCE confirmation

Anything else?

It works if you manually align the columns with a select. Might be intended not sure.

billing_country a
Germany 3
Norway  1
Belgium 1
Canada  8
USA 12
France  4
Ireland 1
United Kingdom  2
Australia   1
Chile   1
India   2
Brazil  4
Portugal    2
Netherlands 1
Spain   1
Sweden  1
Czech Republic  1
Finland 1
Denmark 1
Italy   1
Poland  1
Austria 1
Hungary 1
Argentina   1
53  *
max-sixty commented 1 month ago

Sorry for being slow, but could you explain in a bit more detail what the issue is?

(An expected SQL output would be great)

cottrell commented 1 month ago

Sorry for being slow, but could you explain in a bit more detail what the issue is?

(An expected SQL output would be great)

If you look at the last row it is "53 " and it shoudl be " 53". I should have chosen a better text than "*" I guess as it doesn't jump out.

If it's possible, prql should probably inspect the columns and align them but if the column names are no available (select * clause) then it is not possible.

I think the simplest solution would be to simply warn in the docs for append that the user must manually align the tables.

WITH "A" AS (
  SELECT
    billing_country,
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
  GROUP BY
    billing_country
),
table_0 AS (
  SELECT
    COUNT(DISTINCT billing_city) AS a
  FROM
    invoices
),
"B" AS (
  SELECT
    '*' AS billing_country, -- Ensure this column matches the first query
    a
  FROM
    table_0
)
SELECT
  billing_country,
  a
FROM
  "A"
UNION ALL
SELECT
  billing_country,
  a
FROM
  "B";