DerekStride / tree-sitter-sql

SQL grammar for tree-sitter
http://derek.stride.host/tree-sitter-sql/
MIT License
155 stars 52 forks source link

Errors when embedding Presto inside of Python strings #278

Open tomspeak opened 2 weeks ago

tomspeak commented 2 weeks ago

Almost all of the SQL I write is inside of python data pipelines, using Spark and Presto flavoured syntax.

So my queries are written like

SqlQuery(
  select="""
    WITH users_in_threads AS (
          SELECT
              id,
          FROM {THREAD_TABLE}
          CROSS JOIN UNNEST(userid_array) AS t (userid)
      )
    SELECT
        userid,
        MAP_AGG(id, name) AS names,
        COUNT() AS cnt
    FROM {USERS_TABLE}
    GROUP BY
        1
  """
)

When using tree-sitter-sql via neovim, I extended it to also capture the content inside of these strings via an injection.

; extends

(string 
  (string_content) @injection.content
    (#vim-match? @injection.content "^\w*SELECT|FROM|INNER|JOIN|UNION|WHERE|CREATE|DROP|INSERT|UPDATE|ALTER|WITH.*$")
    (#set! injection.language "sql"))

This "works", using :InspectTree I can see it correctly captures SELECT/WITH statement, but once it gets to the python string interpolation FROM {THREAD_TABLE}, the parser errors.

This is quite a crazy use-case, but wondered if there is anything that can be done to make the parser looser in this context so it at least does not error out?

DerekStride commented 2 weeks ago

I do a similar thing but with Ruby e.g.

sql = <<~SQL
  SELECT *
  FROM table
  WHERE id = %{id}
SQL

unfortunately there isn't a good way to make the parser recover gracefully since it's not actually valid SQL syntax. Tree-sitter should recover most of the time and not make things look completely broken but I have the occasional query where that's not the case.

tomspeak commented 1 week ago

Makes sense!

Due to the compilation steps of tree-sitter-sql, I take it there's no easy way for me to locally extend the definitions? As this is the only way I ever interact with SQL, I am OK (in my local version) introducing the concept of {PYTHON_VARIABLE} to the SQL syntax.

DerekStride commented 1 week ago

I think you'd have to fork the repo to extend the parser in that way. It's like not too hard, you could just add the curly brace ({}) characters to the identifier node and that might be enough.

https://github.com/DerekStride/tree-sitter-sql/blob/c649cb98ca91c660ba8a015aa1fcfc53b859b951/grammar.js#L3484

matthias-Q commented 1 week ago

I was wondering if it would not be possible to have recursive injections. I mean nvim-treesitter knows that the embedded code is SQL, so why not apply the query for injections there as well.