PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.9k stars 217 forks source link

Attempts to sort by column that do not exist after join #3129

Open eitsupi opened 1 year ago

eitsupi commented 1 year ago

What happened?

Sorry the example is not small enough, but the following example seems to fail to execute the query because it tries to sort using column a._expr_0, which does not actually exist.

PRQL input

from a
join side:left b (==col)
sort a.col
select !{a.col}
take 5

SQL output

WITH table_0 AS (
  SELECT
    a.*,
    b.*,
    a.col AS _expr_0
  FROM
    a
    LEFT JOIN b ON a.col = b.col
  ORDER BY
    a._expr_0
  LIMIT
    5
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

-- Generated by PRQL compiler version:0.9.2 (https://prql-lang.org)

Expected SQL output

WITH table_0 AS (
  SELECT
    a.*,
    b.*,
    a.col AS _expr_0
  FROM
    a
    LEFT JOIN b ON a.col = b.col
  ORDER BY
    a.col
  LIMIT
    5
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

-- Generated by PRQL compiler version:0.9.2 (https://prql-lang.org/)

MVCE confirmation

Anything else?

No response

aljazerzen commented 1 year ago

Nice catch, the example is small enough!

syko commented 1 year ago

+1 I just ran into this myself

It seems the issue is with the ordering guarantee characteristic of sort:

PRQL pushes the sort down the pipeline, compiling the ORDER BY to the end of the query.

What's more, I would've thought wrapping it in a variable would work but unfortunately no.

Here's another example runnable in the playground:

let a = (
  from invoices
  sort {-invoice_date}
)
let b = (
  from a
  select {customer_id}
)
from b
WITH a AS (
  SELECT
    *
  FROM
    invoices
),
b AS (
  SELECT
    customer_id
  FROM
    a
)
SELECT
  customer_id
FROM
  b
ORDER BY
  invoice_date DESC

-- Generated by PRQL compiler version:0.9.4 (https://prql-lang.org)
fredericp commented 11 months ago

A smaller example, if it helps:

let x = (
  from table
  sort index
  select {fieldA}
)
from x
WITH table_0 AS (
  SELECT
    "fieldA",
    "index"
  FROM
    "table"
),
x AS (
  SELECT
    "fieldA"
  FROM
    table_0
)
SELECT
  "fieldA"
FROM
  x
ORDER BY
  "index"

-- Generated by PRQL compiler version:0.10.0 (https://prql-lang.org)
eitsupi commented 8 months ago

I don't know when, but this seems to be happening again.

aljazerzen commented 8 months ago

I've just verified it works in the playground:

let x = (
  from artists
  sort artist_id
  select {name}
)
from x
WITH table_0 AS (
  SELECT
    name,
    artist_id
  FROM
    artists
),
x AS (
  SELECT
    name,
    artist_id
  FROM
    table_0
)
SELECT
  name
FROM
  x
ORDER BY
  artist_id

-- Generated by PRQL compiler version:0.11.1 (https://prql-lang.org)

What prqlc version were you using?

eitsupi commented 8 months ago

I've just verified it works in the playground:

I saw:

from a
join side:left b (==col)
sort a.col
select !{a.col}
take 5

to

WITH table_0 AS (
  SELECT
    a.*,
    b.*,
    a.col AS _expr_0
  FROM
    a
    LEFT JOIN b ON a.col = b.col
  ORDER BY
    a._expr_0
  LIMIT
    5
)
SELECT
  *
FROM
  table_0
ORDER BY
  _expr_0

-- Generated by PRQL compiler version:0.11.1 (https://prql-lang.org)

It is the same of the original query of this issue.

aljazerzen commented 8 months ago

Oh, I see, it's the naming. Damn, this ORDER-BY-pushdown feature is producing a lot of weird bugs. Maybe I should re-think it from the beginning.