JuliaDatabases / LibPQ.jl

A Julia wrapper for libpq
MIT License
216 stars 51 forks source link

Tool for capturing query parameters as if they were string interpolations #213

Open c42f opened 3 years ago

c42f commented 3 years ago

I've recently written a macro tool for conveniently capturing query parameters as part of a private project. I've found it handy, so I wondered whether people would be interested in having this as part of LibPQ. Or if not, whether you have suggestions for a better home for it.

The general idea is that we should be able to build up queries by pasting together fragments of SQL and interpolated parameters like one might do with strings or Cmd backtick interpolation. But with all interpolations turned into SQL parameters for safety and consistency in converting those to SQL types.

To just paste the readme in here:

The readme

The main thing provided here is the @query macro to allow queries to be constructed by normal-looking string interpolation but without danger of SQL injection attacks.

Note that @query does not parse or understand the SQL source text as this would be a lot of work. Instead, it keeps any literal SQL text you write as-is and only treats the Julia-level string interpolations specially.

Use runquery to execute queries generated by @query.

Simple usage

Creating a table and inserting some values

conn = LibPQ.connection(your_connection_string)

runquery(conn, @query "create table foo (email text, userid integer)")

for (email,id) in [ ("admin@example.com", 1)
                    ("foo@example.com",   2)]
    runquery(conn, @query "insert into foo values ($email, $id)")
end

Thence:

julia> runquery(conn, @query "select * from foo") |> DataFrame
2×2 DataFrame
 Row │ email              userid
     │ String?            Int32?
─────┼───────────────────────────
   1 │ admin@example.com       1
   2 │ foo@example.com         2

Howto: Inserting values from a Julia array into a row

In some circumstances it can be useful to use splatting syntax to interpolate a Julia collection into a comma-separated list of values. Generally simple scalar parameters should be preferred for simplicity, but splatting can be useful on occasion:

email_and_id = ("bar@example.com", 3)
runquery(conn, @query "insert into foo values ($(email_and_id...))")

Howto: Using the in operator with a Julia collection

There's two ways to do this. First, using in and splatting syntax

julia> ids = (1,2)
       runquery(conn, @query "select * from foo where userid in ($(ids...))") |> DataFrame
       2×2 DataFrame
        Row │ email              userid
            │ String?            Int32?
       ─────┼───────────────────────────
          1 │ admin@example.com       1
          2 │ foo@example.com         2

Second, using the SQL any operator and simply passing a single SQL array parameter:

julia> ids = [1,2]
       runquery(conn, @query "select * from foo where userid = any($ids)") |> DataFrame
       2×2 DataFrame
        Row │ email              userid
            │ String?            Int32?
       ─────┼───────────────────────────
          1 │ admin@example.com       1
          2 │ foo@example.com         2

Howto: Building up a query from fragments

conn = LibPQ.connection(your_connection_string)

some_condition = true

x = 100
x = 20
# Example of an optional clauses - use empty @query() to disable it.
and_clause = some_condition ? @query("and y=$y") : @query()

# Interpolation of values produces SQL parameters; interpolating @query
# fragments adds them to the query.
q = @query "select * from table where x=$x $and_clause"
runquery(conn, q)
iamed2 commented 3 years ago

Okay this is pretty great! And best of all it's optional so if someone's using a database which uses the PostgreSQL wire protocol but not the parameter syntax (i.e., $1, $2) they can just not use the macro.

I'm presuming this functions something like this:

julia> @query "insert into foo values ($(email_and_id...))"
Query("insert into foo values (\$1, \$2)", ("bar@example.com", 3))

Would this make sense as a string macro?

c42f commented 3 years ago

I'm presuming this functions something like this

Very similar (though the interpolation syntax is constructed lazily — this allows multiple such strings to be pasted together dynamically, as in the "Buliding up a query from fragments" section. (It would also allow non-PostgreSQL parameter syntax to be supported for other connection types.)

julia> email_and_id = ("bar@example.com", 3)
("bar@example.com", 3)

julia> @query "insert into foo values ($(email_and_id...))"
insert into foo values ($1,$2)
  $1 = "bar@example.com"
  $2 = 3

julia> dump(@query "insert into foo values ($(email_and_id...))")
JHubDbUtils.SafeQuery
  args: Array{Any}((5,))
    1: QueryFragment
      fragment: String "insert into foo values ("
    2: String "bar@example.com"
    3: QueryFragment
      fragment: String ","
    4: Int64 3
    5: QueryFragment
      fragment: String ")"

Would this make sense as a string macro?

Yes I tried this, but I found it has a big downside: interpolations aren't syntax-highlighted. In practice I felt this quite outweighed the benefits of a string macro for visually parsing complex queries. (The benefits of a string macro being a separate namespace and no need for parentheses in larger expressions with trailing parts.)

c42f commented 3 years ago

By the way, I found it helpful to add a runquery function which was separate from LibPQ.execute() and only accepted SafeQuery in an effort to discourage use of string interpolation in constructing queries. I'm not sure how this would make sense if we had @query in LibPQ itself; maybe it's something application authors could decide to do separately from LibPQ.

By the way, regarding naming I'm not married to using @query; we could call this @sql or @psql, or something else if it made sense. Some people might find it annoying if this clashed with, eg, Query.@query though that's not an issue for me personally.

In the end, @query is really just a way of capturing interpolations for processing later. Actually it's somewhat similar to an old PR in Julia Base https://github.com/JuliaLang/julia/pull/33711/files#diff-281780f4de49ab611364d45135bc3d40067c69924b7548e5e73cfec655fb356a. So another option might be to start a new package — LazyStrings, or some such.

iamed2 commented 3 years ago

Yes I tried this, but I found it has a big downside: interpolations aren't syntax-highlighted. In practice I felt this quite outweighed the benefits of a string macro for visually parsing complex queries. (The benefits of a string macro being a separate namespace and no need for parentheses in larger expressions with trailing parts.)

If it was a string macro you could choose to use the @macro_str form if you wanted, whereas it doesn't work the other way around.

The laziness actually presents a nice argument for this being a generic separate package, since the same syntax could be used for many different database systems.

c42f commented 3 years ago

The laziness actually presents a nice argument for this being a generic separate package, since the same syntax could be used for many different database systems.

Yes that's a good point. I've taken a stab at refactoring this a bit and put the code into a new package SqlStrings.jl. I'll probably register a first version of this shortly:

https://github.com/JuliaComputing/SQLStrings.jl

If it was a string macro you could choose to use the @macro_str form if you wanted, whereas it doesn't work the other way around.

In SqlStrings, I've made it @sql_cmd because:

At the moment it requires a small amount of glue code to integrate with LibPQ, but it would be great to make that unnecessary.

I'm not 100% sure of the best way forward for integration. We could make SqlStrings a dependency of LibPQ and make LibPQ.execute optionally take an SqlString?

A longer term goal might be to make the publicly exported execute() not take a normal string which would heavily discourage people from using plain string interpolation to construct queries. But obviously that would be quite breaking. Thoughts?