Multiple uses of same variable

mlesin commented 4 years ago

Right now, if I use variable multiple times, each time I use it creates a new variable in SQL query. If I want to debug resulting query and copy-paste it to external program like, OmniDB, it's not very convenient. I don't know if it was made on purpose, but if it's not, maybe would be better to have one value per variable.

For example, I have query like:

SELECT date_trunc('day', date):: date as date,
    (SELECT COUNT(*) FROM tasks
        WHERE date>="canceled" AND "canceled">(:start_date)::date) "canceled",
    (SELECT COUNT(*) FROM tasks
        WHERE date>="done" AND (date<"canceled" or "canceled" is null) AND "done">(:start_date)::date) "done",
FROM generate_series((:start_date)::date, (:end_date)::date, '1 day'::interval) AS date
ORDER BY date

And, when I source it with AyeSQL, I'm getting $1 for first invocation of :start_date, $2 for second invocation of :start_date, $3 for third, and $4 for :end_date. I think it would be much more readable and more convenient for text replace in external programs (for debugging), if there would be only two variables, $1 and $2 for :start_date and :end_date respectively, and $1 should be used 3 times for each invocation of :start_date

alexdesousa commented 4 years ago

@mlesin Thanks for the feedback!

It is done on purpose due to the way Postgrex sends the information to the database (I actually think this is the way PostgreSQL works underneath as well). When running a raw query, Postgrex doesn't use variables, but values e.g:

Postgrex.query("SELECT * FROM users WHERE id = $1 AND age= $2", [42, 42])

I must confess I had the same problems you're experiencing while debuging, so I think it would be a good idea to revise AyeSQL.Query struct in order to keep the information of the (immutable) variable.

Right now, AyeSQL's lexer and parser are too simple. They don't store the context while converting the file to an elixir module. One feature I haven't had the time to implement is tracking lines and columns for relevant part of the query (query name and docs, and variables). This might be useful to have a better insight into the actual query and track the variables.

I'll try to come up with a solution for this.

mlesin commented 4 years ago

I meant, would query like this also work? (Please ignore that fact that the query itself is meaningless):

Postgrex.query("SELECT * FROM users WHERE id = $1 AND age= $1", [42])

alexdesousa commented 4 years ago

@mlesin Yes, that would work. Yet, for AyeSQL is a bit more complex. It would need to track every variable (not values) when composing queries.

What I mean with tracking variables is about values that are the same, but, semantically, they represent different data e.g. the following might work, by id and age are two different types of data. So in this sense, they should be different.

Postgrex.query(conn, "SELECT * FROM users WHERE id = $1 AND age= $1", [42])

In the query you proposed at the beginning, the correct treatment of :start_date should be the same for the whole query. For simple queries, this is easy to implement. For composable queries, it will require a bit more of context.

So, yeah, in the end we need an update of AyeSQL core to be able to handle queries efficiently. This would be a nice enhancement :)

mlesin commented 4 years ago

I didn't mean we need to combine variables with the same value just because value is the same, of course. That's why I told to ignore the fact that query is meaningless. Let's return to my previous example, it would be great if it will be expanded to something like:

Postgrex.query("SELECT date_trunc('day', date):: date as date,
    (SELECT COUNT(*) FROM tasks
        WHERE date>="canceled" AND "canceled">($1)::date) "canceled",
    (SELECT COUNT(*) FROM tasks
        WHERE date>="done" AND (date<"canceled" or "canceled" is null) AND "done">($1)::date) "done",
FROM generate_series(($1)::date, ($2)::date, '1 day'::interval) AS date
ORDER BY date", [~D[2020-01-01],~D[2020-02-15])

instead of

Postgrex.query("SELECT date_trunc('day', date):: date as date,
    (SELECT COUNT(*) FROM tasks
        WHERE date>="canceled" AND "canceled">($1)::date) "canceled",
    (SELECT COUNT(*) FROM tasks
        WHERE date>="done" AND (date<"canceled" or "canceled" is null) AND "done">($2)::date) "done",
FROM generate_series(($3)::date, ($4)::date, '1 day'::interval) AS date
ORDER BY date", [~D[2020-01-01],~D[2020-01-01],~D[2020-01-01],~D[2020-02-15])

as it happens now.

fcarlislehg commented 3 years ago

This change would break support for MySQL and ODBC based drivers. They expect ? and duplicate params based on position.

MyXQL.query("SELECT * FROM users WHERE id = ? AND age= ?", [42, 42])

If you do this for Postgres please make it optional.

alexdesousa commented 3 years ago

Considering what @fcarlislehg said, I think the best approach would be to create a debugging functionality that let's us get the plain SQL query, but keeps the way AyeSQL currently handles SQL.

alexdesousa / ayesql

Multiple uses of same variable #16