atom / language-sql

SQL package for Atom
Other
38 stars 50 forks source link

Correct time/timestamp grammar matches #60

Closed ZincBehemoth closed 7 years ago

ZincBehemoth commented 7 years ago

Description of the Change

The keywords for time and timestamp do not appear to work as intended.

This issue is explained in textmate/sql.tmbundle#11 and has been duplicated when the code was converted for Atom.

These strings are highlighted:

times(123) withoutstimeszone timestamps(123) withoutstimeszone

But it appears that the regular expressions were intended to match "without time zone", intended to make the 's' on timestamp optional, and to make the numeric constant in parentheses optional. As it is, it only works "withoutstimeszone" and the numeric constant must be present to get the time zone string.

This change fixes the issue.

The fix in the Textmate project replaces |\b(times)(?:\((\d+)\))(\swithoutstimeszone\b)? with |\b(times?)\b(?:\((\d+)\))?(\swith(?:out)?\stime\szone\b)?

and

|\b(timestamp)(?:(s)\((\d+)\)(\swithoutstimeszone\b)?)? with |\b(timestamp)(?:(s|tz))?\b(?:\((\d+)\))?(\s(with|without)\stime\szone\b)?

However I could find no mention of times and timestamps being types in any dialect of SQL and assume this must have been caused by missing escapes for whitespace characters \s. Additionally timestamptz is shorthand for timestamp with time zone and so should not capture any with time zone or without time zone modifier. When implementing I have added greater whitespace tolerance allowing multiple characters whereas before only a single character or none was allowed.

I have replaced the grammar for types without the tz suffix to become: |\\b(time(?:stamp)?)\\b(?:\\s*\\(\\s*(\\d+)\\s*\\))?(?:\\s*(with(?:out)?\\s+time\\s+zone\\b))? which combines the unnecessarily separated time and timestamp patterns.

Now timestamptz fits in with similar types like varchar that have an optional suffix. I have also added the similar type shorthand timetz: |\\b(char|number|n?varchar\\d?|time(?:stamp)?tz)\\b(?:\\s*\\(\\s*(\\d+)\\s*\\))?

Alternate Designs

It is possible make with time zone and without time zone match on their own as part of a type definition, therefore allowing this part to be recognised across a new line. However, CTEs use the keyword with and I am not certain how the two would interact.

The timetz and timestamptz are grouped together with other optional numeric arguments. However these could all be moved to the matching group for types without such arguments since numeric constants are already matched on their own.

Benefits

The grammar parser will now recognise timestamp (1) without time zone phrases and (similar with time) with the precision specifying component and the time zone clause optional, whereas previously it only recognised the timestamp portion before incorrect matching.

The timetz and timestamptz can now be included in the matches for types with an optional numeric suffix such as varchar, and will not incorrectly match a subsequent without time zone clause.

Possible Drawbacks

If times or timestamps are types in any dialect of SQL, these would be removed.

Correctly recognising key words could cause conflicts with entities such as columns named in dialects that lack these keywords and so share names. This does not seem like a likely occurrence though.

Applicable Issues

Apart from timetz everything is just making the grammar work as intended. The type shorthand timetz does not appear to be listed in the documentation for PostgreSQL, but it is recognised by it nonetheless.

winstliu commented 7 years ago

Awesome! Thanks for this, as well as the super-detailed writeup and specs! Just waiting for Travis before merging.