PRQL / prql-query

Query and transform data with PRQL
Apache License 2.0
126 stars 7 forks source link

Backtick-quoted column name in SQL output from query with `count_distinct` #26

Open RalfNorthman opened 1 year ago

RalfNorthman commented 1 year ago

Cool project!

I used the prql-query CLI-tool v0.0.14 in the following examples.

The problem

When compiling:

prql target:sql.postgres
from developers
group team (
    aggregate [
        skill_width = count_distinct specialty,
    ]
)

to sql I get:

SELECT
  team,
  COUNT(DISTINCT `specialty`) AS skill_width
FROM
  developers
GROUP BY
  team

I did not expect the output to have a column name quoted with backticks (and neither did the database I sent it to, roapi).

In contrast

If we have a weird column name and backtick-quote it in the select statement of the prql:

from developers
select `weird name`

we get a column name with double-quotes as expected:

SELECT
  "weird name"
FROM
  developers

Possible culprit

Searching the codebase I found this line in prql-compiler/src/sql/std_impl.prql:

func count_distinct <scalar|column> column -> s"COUNT(DISTINCT `{column}`)"

which contains backticks. Is there a good reason for this? None of the other functions in the file use backticks.

aljazerzen commented 1 year ago

Nice report, very readable.

You've probably found the correct culprit and I don't know why backticks are here. Probably historical reason.

Will investigate and remove if it turns out that they are not needed.

snth commented 1 year ago

Hi @RalfNorthman ,

Thank you for the report. This is probably a bug in prql-query which is quite out of date with the main prql-compiler. I will look at updating that asap.

I will transfer the issue there in the meantime.

snth commented 1 year ago

Thanks for fixing this in prql-compiler @aljazerzen . I'll still need to integrate this into prql-query so reopening the issue.