Closed GoogleCodeExporter closed 8 years ago
Thanks for the bug report.
From http://www.mozilla.org/js/language/js20/rationale/syntax.html
"To support error recovery, JavaScript 2.0's lexical grammar must be
made independent of its syntactic grammar. To make the lexical grammar
independent of the syntactic grammar, JavaScript 2.0 determines whether
a / starts a regular expression or is a division (or /=) operator solely
based on the previous token."
That page then lists the tokens that can precede a Regex literal, and says:
"Regardless of the previous token, // is interpreted as the beginning
of a comment."
Original comment by mikesamuel@gmail.com
on 8 May 2007 at 6:19
I'm going to assume that "the previous token" does not consider either comment
or
whitespace tokens.
I'm further going to assume that the list of preceding tokens
"!", "!=", "!==", "#", "%", "%=", "&", "&&", "&&=", "&=", "(", "*",
"*=", "+", "+=", ",", "-", "-=", "->", ".", "..", "...", "/", "/=", ":",
"::", ";", "<", "<<", "<<=", "<=", "=", "==", "===", ">", ">=", ">>",
">>=", ">>>", ">>>=", "?", "@", "[", "^", "^=", "^^", "^^=", "{", "|",
"|=", "||", "||=", "~", "abstract", "break", "case", "catch", "class",
"const", "continue", "debugger", "default", "delete", "do", "else",
"enum", "export", "extends", "field", "final", "finally", "for",
"function", "goto", "if", "implements", "import", "in", "instanceof",
"is", "namespace", "native", "new", "package", "return", "static",
"switch", "synchronized", "throw", "throws", "transient", "try",
"typeof", "use", "var", "volatile", "while", "with",
So I'll need to check that the '.' is not the tail of a number.
Also, since I'm trying to come up with a lexical scheme that supports reasonably
readable code in a variety of languages I think I'll skip the keywords in this
list
that are not keywords in most languages -- `debugger`, `function`, and `field`
come
to mind, and `in` and `with` might cause problems as well. `with` in js has to
be
followed by an open paren, but `in` might present problems.
Removing the set of keywords, that in javascript cannot legally be followed by a
regexp literal according to the grammar yields
"!", "!=", "!==", "#", "%", "%=", "&", "&&", "&&=", "&=", "(", "*",
"*=", "+", "+=", ",", "-", "-=", "->", ".", "..", "...", "/", "/=", ":",
"::", ";", "<", "<<", "<<=", "<=", "=", "==", "===", ">", ">=", ">>",
">>=", ">>>", ">>>=", "?", "@", "[", "^", "^=", "^^", "^^=", "{", "|",
"|=", "||", "||=", "~", "break", "case", "catch",
"continue", "delete", "do", "else", "finally",
"in", "instanceof", "is", "return", "throw", "try", "typeof",
"is" and "in" are problematic, and are recently introduced language features.
I'm
inclined to skip them too.
Original comment by mikesamuel@gmail.com
on 8 May 2007 at 6:35
Fixed: implemented lexing of regular expression literals using
an approach based on javascripts lexical grammar to decide when a /
begins a regexp literal.
See testcase at
http://google-code-prettify.googlecode.com/svn/trunk/tests/prettify_test.html#is
sue12
This is more conservative than javascript since I don't attempt to handle
lexically valid but syntactically invalid javascript.
There is one case where a regexp literal in a syntactically valid javascript
will not be recognized
for (var fieldName in /foo/) {
...
}
I have never seen this in practice. Someone might iterate over a regexp to
iterate out parenthetical matches, but they would have to assign the regexp
to a variable first, since javascript does not allow pooling of regexp
literals.
Original comment by mikesamuel@gmail.com
on 23 May 2007 at 4:09
Original issue reported on code.google.com by
phunl...@gmail.com
on 22 Apr 2007 at 3:30