DerekStride / tree-sitter-sql

SQL grammar for tree-sitter
http://derek.stride.host/tree-sitter-sql/
MIT License
147 stars 47 forks source link

Syntax highlighting appears to give up in Postgres function bodies #236

Open jaidetree opened 7 months ago

jaidetree commented 7 months ago

Given the following Postgres SQL function:

CREATE OR REPLACE FUNCTION start_task(p_task_name VARCHAR)
RETURNS sessions AS $$
DECLARE
  task_id INTEGER;
  task_row RECORD;
  session_record sessions;
BEGIN
    -- Look for any ongoing sessions for an active task
    SELECT *
      INTO task_row
    FROM tasks t
    INNER JOIN sessions s ON t.id = s.task_id
    WHERE s.end_time IS NULL
    AND s.is_active = TRUE
    AND t.is_active = TRUE
    LIMIT 1;

    -- If an ongoing session was found, throw an error with hint
    IF task_row.id IS NOT NULL THEN
      RAISE EXCEPTION 'Task % "%" is still in progress. Action refused.',
        task_row.id, task_row.task_name
    ELSE
      -- Look up the task_id from the name
      SELECT id INTO task_id
      FROM tasks
      WHERE task_name = p_task_name;

      -- Start the session
      INSERT INTO sessions (task_id)
      VALUES (task_id)
      RETURNING * INTO session_record;
    END IF;

    RETURN session_record;
END;
$$ LANGUAGE plpgsql;

Syntax highlighting completely stops for the rest of the file if I keep that semicolon after the first query that ends with LIMIT 1;

image

If I remove that semicolon after the first query the syntax highlighter continues highlighting the file but the syntax is invalid and running the migration fails.

Any ideas what's causing that? Happy to help with a PR to fix but will need some guidance. Though as far as I know, I'm just using this wrong 😅

dmfay commented 7 months ago

The short version is that this library parses SQL across dialects, but does not parse specific procedural elaborations, like Postgres' PL/pgSQL or Oracle's PL/SQL. CREATE FUNCTION is standard SQL, but up until recently the standard said that function bodies are strings, and not even necessarily SQL strings at that (viz. pl/Python). BEGIN ATOMIC changed that for SQL functions but not procedural languages -- those dollar quotes delimit a string. We parse the function body because it's probably SQLish, but as you see the parser doesn't know what to do with IF or RAISE. It should know what to do with RETURN, and if you had a few more standard statements after your IF block the parser might recover since tree-sitter tries to be generous, but there aren't any guarantees. It bugs me too; there's been a little discussion, but what you see is where we got. I think there's probably a case for giving the basics like IF a shot, honestly; we can always decline if it turns out to really balloon the library size.