select distinct
next_account(5) as `rick and morty`,
count (*) /* WENEEDReplaceThis select */
from table_with t
where x>10 and x<100 or x == 0 and code = 'stat\'ic'
cluster by col1
The sql-beautify extension produces the following:
SELECT distinct next_acCOUNT(5) AS `rick AND morty`,
count (*) /* WE'stat\'This
SELECT */
FROM table_
WITH t
WHERE x > 10
AND x < 100 or x = = 0
AND code = undefinedic' cluster by col1
Which demonstrates the following issues:
block-comments /*..*/ are not supported
quoted identifiers `blah` are not supported
escaping inside strings "foo\"bar" is not supported
identifiers containing certain reserved words get messed up
using the special text NEEDReplace anywhere in the SQL will mess up the formatting
some operators (e.g. ==) get broken up with whitespace
not all keywords get properly capitalized (e.g. AND operator is capitalized but or is not).
count(*) gets capitalized (and aggressively so), but count (*) does not.
some Hive-specific language constructs like CLUSTER BY aren't recognized at all.
The implementation of this extension consist basically of a series of string-replace operations. This approach is unfortunately fundamentally flawed. No amount of additional fixes will save it.
At the very minimum one should tokenize the SQL (this is the approach that sql-formatter uses). Better yet would be to use a parser (an approach used by prettier-plugin-sql-cst and SQLFluff).
Given the following SQL:
The sql-beautify extension produces the following:
Which demonstrates the following issues:
/*..*/
are not supported`blah`
are not supported"foo\"bar"
is not supportedNEEDReplace
anywhere in the SQL will mess up the formatting==
) get broken up with whitespaceAND
operator is capitalized butor
is not).count(*)
gets capitalized (and aggressively so), butcount (*)
does not.CLUSTER BY
aren't recognized at all.The implementation of this extension consist basically of a series of string-replace operations. This approach is unfortunately fundamentally flawed. No amount of additional fixes will save it.
At the very minimum one should tokenize the SQL (this is the approach that sql-formatter uses). Better yet would be to use a parser (an approach used by prettier-plugin-sql-cst and SQLFluff).