JakeWheat / simple-sql-parser

SQL parser in Haskell
http://jakewheat.github.io/simple-sql-parser/latest
Other
82 stars 29 forks source link

The dilemma of optional terminating semicolon and empty statement. #30

Closed kindaro closed 3 years ago

kindaro commented 3 years ago

There is an inherent contradiction in making the statement parser work on strings that are not terminated by semi (semicolon parser). Consider this example:

% cat example.sql
-- MariaDB dump 10.18  Distrib 10.5.8-MariaDB, for Linux (x86_64)
--
-- Host: localhost    Database: prestashop
-- ------------------------------------------------------
-- Server version       10.5.8-MariaDB

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8mb4 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;

See the error:

λ import Language.SQL.SimpleSQL.Parse 
λ import Language.SQL.SimpleSQL.Dialect
λ readFile "example.sql" >>= pure . parseStatements mysql "" Nothing
Left (ParseError {peErrorString = "(line 1, column 1):\nunexpected Symbol \";\"\nexpecting create, alter, drop, delete from, truncate table, insert into, update, start transaction, savepoint, release savepoint, commit, rollback, grant, revoke, with, values, table or select", peFilename = "", pePosition = (1,1), peFormattedError = ":1:1:\n-- MariaDB dump 10.18  Distrib 10.5.8-MariaDB, for Linux (x86_64)\n^\n(line 1, column 1):\nunexpected Symbol \";\"\nexpecting create, alter, drop, delete from, truncate table, insert into, update, start transaction, savepoint, release savepoint, commit, rollback, grant, revoke, with, values, table or select"})

Consider a simpler case:

λ parseStatements mysql "" Nothing ";"
Left (ParseError {peErrorString = "(line 1, column 1):\nunexpected Symbol \";\"\nexpecting create, alter, drop, delete from, truncate table, insert into, update, start transaction, savepoint, release savepoint, commit, rollback, grant, revoke, with, values, table or select", peFilename = "", pePosition = (1,1), peFormattedError = ":1:1:\n;\n^\n(line 1, column 1):\nunexpected Symbol \";\"\nexpecting create, alter, drop, delete from, truncate table, insert into, update, start transaction, savepoint, release savepoint, commit, rollback, grant, revoke, with, values, table or select"})

Same error. The reason is clear: empty statements are not supported. The support is desirable — the example is taken straight out of mysqldump!

But how can we parse an empty statement? Without a semi, it is a parser that succeeds consuming nothing. Now statements may loop forever, parsing infinite number of empty statements!

I propose that statements without a terminating semi should not be supported, and that empty statements should be parsed with a sole semi into their own nullary data constructor.

JakeWheat commented 3 years ago

Thanks for the report. I think there must be a way to support 'empty statements' with a semi colon on its own as well as making it optional at the end of non empty statements.

kindaro commented 3 years ago

But how?

We use an expression like sepEndBy statement semi to parse multiple statements.

We may allow statements like /* */; to be parsed without consuming the semicolon: as white space and commentary is consumed, the loop is avoided. That would be sufficient to parse the actual output of mysqldump above. I think it is better to drop the support for statements without a semicolon and have consistency. But I am on board with whatever solution you choose.

What matters for me is that we decide something and proceed to other fascinating parse failures that await in the remaining code of my example. Since I already put a temporary fix for the semicolon problem into my fork, I have seen — there are many.

kindaro commented 3 years ago

Maybe we can use a more complicated parser for statements, along the lines of many statementWithSemicolon >> optional statementWithoutSemicolon, where statementWithSemicolon may parse an empty statement, but statementWithoutSemicolon cannot. I shall try this.

JakeWheat commented 3 years ago

What if you make the optional semicolon part of the statement, and parse a semicolon on its own as the empty statement, something like statement = (nonemptystatement >> optional semicolon) | semicolononly statements = many statement

JakeWheat commented 3 years ago

Thanks for the contributions!

kindaro commented 2 years ago

Thank you Jake.