Several fundamental problems

Given the following SQL:

select distinct
  next_account(5) as `rick and morty`,
  count (*) /* WENEEDReplaceThis select */
from table_with t
where x>10 and x<100 or x == 0 and code = 'stat\'ic'
cluster by col1

The sql-beautify extension produces the following:

SELECT  distinct next_acCOUNT(5) AS `rick AND morty`,
        count (*) /* WE'stat\'This
SELECT  */
FROM table_
WITH t
WHERE x > 10
AND x < 100 or x =  = 0
AND code = undefinedic' cluster by col1

Which demonstrates the following issues:

block-comments /*..*/ are not supported
quoted identifiers `blah` are not supported
escaping inside strings "foo\"bar" is not supported
identifiers containing certain reserved words get messed up
using the special text NEEDReplace anywhere in the SQL will mess up the formatting
some operators (e.g. ==) get broken up with whitespace
not all keywords get properly capitalized (e.g. AND operator is capitalized but or is not).
count(*) gets capitalized (and aggressively so), but count (*) does not.
some Hive-specific language constructs like CLUSTER BY aren't recognized at all.

The implementation of this extension consist basically of a series of string-replace operations. This approach is unfortunately fundamentally flawed. No amount of additional fixes will save it.

At the very minimum one should tokenize the SQL (this is the approach that sql-formatter uses). Better yet would be to use a parser (an approach used by prettier-plugin-sql-cst and SQLFluff).

clarkyu2016 / sql-beautify

Several fundamental problems #62