DerekStride / tree-sitter-sql

SQL grammar for tree-sitter
http://derek.stride.host/tree-sitter-sql/
MIT License
149 stars 47 forks source link

Parse issues with SQLite generated columns #193

Closed ciarand closed 1 year ago

ciarand commented 1 year ago

Hello! This is a wonderful parser that I benefit from daily. Thanks for your hard work!

I noticed an issue when I was experimenting with generated columns in my SQLite schema. Here's a quick cut-down example:

CREATE TABLE IF NOT EXISTS Items (
  item_type INTEGER NOT NULL,
  item_type_name TEXT GENERATED ALWAYS AS (
    CASE item_type
      WHEN 1 THEN 'foo'
      WHEN 2 THEN 'bar'
      ELSE 'UNKNOWN'
    END
  ) VIRTUAL
);

https://sqlime.org/#deta:pak48dg2wu8z

The AST appears to break after the AS keyword:

(statement) ; [1:1 - 5:22]
 (create_table) ; [1:1 - 5:22]
  (keyword_create) ; [1:1 - 6]
  (keyword_table) ; [1:8 - 12]
  (keyword_if) ; [1:14 - 15]
  (keyword_not) ; [1:17 - 19]
  (keyword_exists) ; [1:21 - 26]
  (object_reference) ; [1:28 - 32]
   name: (identifier) ; [1:28 - 32]
  (column_definitions) ; [1:34 - 5:22]
   (column_definition) ; [2:3 - 28]
    name: (identifier) ; [2:3 - 11]
    type: (int) ; [2:13 - 19]
     (keyword_int) ; [2:13 - 19]
    (keyword_not) ; [2:21 - 23]
    (keyword_null) ; [2:25 - 28]
   (column_definition) ; [3:3 - 5:22]
    name: (identifier) ; [3:3 - 16]
    type: (keyword_text) ; [3:18 - 21]
    (keyword_generated) ; [3:23 - 31]
    (keyword_always) ; [3:33 - 38]
    (keyword_as) ; [3:40 - 41]
    (ERROR) ; [3:43 - 5:19]
     (keyword_case) ; [4:5 - 8]
     (identifier) ; [4:10 - 18]
     (keyword_then) ; [5:14 - 17]
    (identifier) ; [5:20 - 22]
(ERROR) ; [5:23 - 11:0]

Since this is valid SQLite, it would be cool if it didn't break. I haven't checked other DB engines exhaustively, but it appears MySQL and Postgres offer similar constructs.

MySQL:

col_name data_type [GENERATED ALWAYS] AS (expr)
  [VIRTUAL | STORED] [NOT NULL | NULL]
  [UNIQUE [KEY]] [[PRIMARY] KEY]
  [COMMENT 'string']

Postgres:

CREATE TABLE people (
    ...,
    height_cm numeric,
    height_in numeric GENERATED ALWAYS AS (height_cm / 2.54) STORED
);
matthias-Q commented 1 year ago

Thanks for reporting, yes column specification might miss some corner cases. I am a bit short on time for the next weeks, but I can have a look.

matthias-Q commented 1 year ago

As of now, the parser can only use indentifiers there after this as: https://github.com/DerekStride/tree-sitter-sql/blob/9ed90cc742a396423610d0523db0ea5751fa795e/grammar.js#L2206

Should be at least an _expression