amohanta / google-caja

Automatically exported from code.google.com/p/google-caja
0 stars 0 forks source link

"override" and "namespace" not allowed as identifiers #93

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
for example, x.is() gives an error. It shouldn't.

Original issue reported on code.google.com by ben@links.org on 10 Mar 2008 at 8:14

GoogleCodeExporter commented 9 years ago
I based the keyword list on
http://web.archive.org/web/20050502190140/http://www.mozilla.org/js/language/es4
/formal/parser-grammar.html
and it and http://javascript.about.com/library/blreserved.htm say otherwise.

Section 7.5.2 and 7.5.3 of ES262 lists the current and future reserved words:
    7.5.2 Keywords
    The following tokens are ECMAScript keywords and may not be used
    as identifiers in ECMAScript programs.
    Syntax
    Keyword :: one of
      break    else       new    var
      case     finally    return void
      catch    for        switch while
      continue function   this   with
      default  if         throw
      delete   in         try
      do       instanceof typeof

    ECMAScript Language Specification Edition 3 24-Mar-2000
    7.5.3 Future Reserved Words
    The following words are used as keywords in proposed extensions and
    are therefore reserved to allow for the possibility of future adoption
    of those extensions.
    Syntax
    FutureReservedWord :: one of
      abstract enum       int       short
      boolean  export     interface static
      byte     extends    long      super
      char     final      native    synchronized
      class    float      package   throws
      const    goto       private   transient
      debugger implements protected volatile
      double   import     public

Section 1.8 of http://www.ecmascript.org/es4/spec/incompatibilities.pdf lists
incompatibilities in the keyword list between ES4 and ES3.  What should we do 
about
such things?

Original comment by mikesamuel@gmail.com on 10 Mar 2008 at 8:42

GoogleCodeExporter commented 9 years ago
MarkM responded by email:

As I just discussed verbally with Mike, in order for Caja to be as
close as possible to a fail-stop subset of particular [proposed]
versions of JavaScript/EcmaScript, we should reserve all those
keywords which are reserved in any of the [proposed] versions we care
about. I suggest that this set should be:

ES3, ES3.1, proposed ES4, IE6 or later, Firefox 1.5 or later, Opera 9
or later, Safari 3 or later.

Section 2.2 of
<http://wiki.ecmascript.org/lib/exe/fetch.php?id=resources%3Aresources&cache=cac
he&media=resources:jscriptdeviationsfromes3.pdf>
compares the behavior of the various browsers against ES3. It reports
that they honor different but overlapping subsets of the keyword
reservations speced by ES3. As the last ES3.1 phone meeting, we agreed
that ES3.1 should continue to reserve all the keywords reserved by ES3
and either reserved or used as keywords in the current ES4 proposal.
When there's ambiguity, we should err on the side of reserving too
many keywords. It's always upwards compatible for us to decide to
unreserve some of these later.

None of these reserve "is".

Original comment by mikesamuel@gmail.com on 10 Mar 2008 at 10:27

GoogleCodeExporter commented 9 years ago
The current keyword set is defined at
http://code.google.com/p/google-caja/source/browse/trunk/src/java/com/google/caj
a/lexer/Keyword.java

Original comment by mikesamuel@gmail.com on 10 Mar 2008 at 10:45

GoogleCodeExporter commented 9 years ago
The union of ES4 1.8 and Mark's pointer is:
abstract  boolean      break   byte     case
catch     char         class   const    continue
debugger  default      delete  do       double
else      enum         export  extends  final
finally   float        for     function goto
if        implements   import  in       instanceof
int       interface    let     long     native
new       override     package private  protected
public    return       short   static   super
switch    synchronized this    throw    throws
transient try          typeof  var      void
volatile  while        with    yield

Original comment by mikesamuel@gmail.com on 10 Mar 2008 at 10:58

GoogleCodeExporter commented 9 years ago

Original comment by mikesamuel@gmail.com on 10 Mar 2008 at 11:25

GoogleCodeExporter commented 9 years ago
David-Sarah Hopwood said:

Proposed ES4 does reserve "is", according to
<http://www.ecmascript.org/es4/spec/grammar.pdf>.

Original comment by mikesamuel@gmail.com on 11 Mar 2008 at 12:50

GoogleCodeExporter commented 9 years ago
Mark, Mike, and I went over the list of ES4 contextually reserved words that 
are not
also reserved words in ES3.1.

We decided to include each, let, namespace, override, use, and yield.  We 
decided not
to include type and xml and a number of other ones because those are frequently 
used
in a large codebase so are likely to break existing code.

Keyword set:
============
abstract     boolean    break   byte       case
catch        char       class   const      continue
debugger     default    delete  do         double
each         else       enum    export     extends
false        final      finally float      for
function     goto       if      implements import
in           instanceof int     interface  let
long         namespace  native  new        null
override     package    private protected  public
return       short      static  super      switch
synchronized this       throw   throws     transient
true         try        typeof  use        var
void         volatile   while   with       yield

Original comment by mikesamuel@gmail.com on 11 Mar 2008 at 2:47

GoogleCodeExporter commented 9 years ago

Original comment by mikesamuel@gmail.com on 11 Mar 2008 at 4:26

GoogleCodeExporter commented 9 years ago
Felix8a reports:

Reserved word override used as an identifier
Reserved word namespace used as an identifier

those are the two that turn up most often for me.
they're used in the YUI library.
override could be eliminated, I think it's only used in private contexts.
namespace is a problem, it's a method in the public api.

not sure what's a good solution to that.

Original comment by mikesamuel@gmail.com on 3 Apr 2008 at 7:57

GoogleCodeExporter commented 9 years ago
I believe the ES4 committee went through a paring down phase recently.

namespace and override both show up in the contextually reserved section 
according to
the 30 Mar version of http://www.ecmascript.org/es4/spec/grammar.pdf

If that means they are significant where var is significant, then we might be 
able to
allow them as identifiers.

Original comment by mikesamuel@gmail.com on 3 Apr 2008 at 8:01

GoogleCodeExporter commented 9 years ago

Original comment by mikesamuel@gmail.com on 7 Apr 2008 at 8:48

GoogleCodeExporter commented 9 years ago
David-Sarah Hopwood <david.hopwood@industrial-designers.co.uk>        11 April 
2008 14:36
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

Mike Samuel wrote:
> On 11/04/2008, David-Sarah Hopwood <david.hopwood@industrial-designers.co.uk>
> wrote:
>>
>> I seriously don't think the ES4 spec writers have a clue as to whether
>> and how this "contextually reserved identifier" thing is going to be
>> implementable. There seems to have been no analysis of how much lookahead
>> is necessary, and the grammar is full of mistakes that discouraged me from
>> putting much effort into working it out. (Trivial example: "like" and
>> "internal" should be reserved words.)
>
> Ah.  Thanks.  Yeah, I haven't taken the time to look through the ES4
> grammer.
>
>> For Jacaranda, I'm only going to allow any of the contextually reserved
>> words as identifiers on the basis of a local analysis of the parts of the
>> ES4 grammar that seem to be fairly stable (for example, some of these
>> words only have a special meaning when they occur in a pragma after
>> the word "use", so we can allow them if we reserve "use" unconditionally).
>> No such local analysis is possible for "namespace", and a global analysis
>> is not worth the effort just in order to allow a few more variable names,
>> especially since it might be invalidated by future changes.
>
> Are there any tools that can tell whether a string, " var namespace ", is a
> substring of any string in a language defined by a CF grammar?

That's a very good question, but I'm not an expert on parsing theory.

With Google and citeseer, anyone can fake being an expert, though ;-) --
apparently the problem is called "substring parsing":

<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.9949>

# The problem of recognizing substrings of context-free languages has
# emerged in several interesting applications and can be described as
# follows. Given a string y and a grammar G = (V; Sigma; P; S), we wish
# to know whether there exist two additional strings x and z such that
# xyz is a sentence of G. An important application for a corresponding
# substring parser is a method for detecting syntax errors [...]
[...]
# In this article, we develop a substring parser that can be used with
# SLR(k), LALR(k) and canonical LR(k) grammars.

What kind of grammar do we have for ES3 and/or ES4? The grammar given in
both specs seems to be easily expressible as a PEG, but I don't know how
to transform it into any of the forms above, so I'm hoping someone else
has done the work.

This would also answer a question I have about whether it is possible in
ES3 or ES4 for two nonreserved identifiers to occur next to each other just
separated by whitespace. Does anyone happen to know the answer to that,
preferably with a convincing argument?

> Such a thing might also help the ES4 people nail down the meaning of
> "contextually reserved."

Yes.

--
David-Sarah Hopwood

David-Sarah Hopwood <david.hopwood@industrial-designers.co.uk>  11 April 2008 
16:54
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

David-Sarah Hopwood wrote:
> With Google and citeseer, anyone can fake being an expert, though ;-) --
> apparently the problem is called "substring parsing":
>
> <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.60.9949>
>
> # The problem of recognizing substrings of context-free languages has
> # emerged in several interesting applications and can be described as
> # follows. Given a string y and a grammar G = (V; Sigma; P; S), we wish
> # to know whether there exist two additional strings x and z such that
> # xyz is a sentence of G. An important application for a corresponding
> # substring parser is a method for detecting syntax errors [...]
> [...]
> # In this article, we develop a substring parser that can be used with
> # SLR(k), LALR(k) and canonical LR(k) grammars.

Aha. Chapter 12 of "Parsing Techniques -- A Practical Guide, 2nd edition"
<http://www.cs.vu.nl/~dick/PT2Ed.html>, treats this subject in detail.

The basic idea is quite simple:

# We define the suffix language of a language L as the set of strings
# obtained by removing one of more tokens from the front of sentences
# in L. [snip]
# The suffix language of a context-free language is again a context-free
# language, and a grammar GS can be constructed given the grammar G of
# the original language. Such a grammar is called a suffix grammar, and
# it can be constructed as follows: for every non-terminal A in G, we
# introduce a new non-terminal A' which derives a suffix of a sentence
# generated by A. If G contains a rule
#
#   A -> X1 X2 ... Xn
#
# the suffix grammar GS will also contain this rule and in addition it
# will contain the following series of rules deriving suffixes of what
# A can derive:
#
#   A' -> X1' X2 ... Xn
#   A' -> X2' ... Xn
#   ...
#   A' -> Xn'
#
# A' can be thought of as a "damaged" A, where the damage has been
# restricted to the front. So if Xi is a terminal symbol, Xi' is the
# empty string, since that is the result of damaging a terminal symbol.
# No rule is generated for A -> \epsilon; see Problem 12.2 for a small
# complication
[when A -> \epsilon is the only rule for A, which I think never occurs in
the ES3/4 grammars].
# If S is the start symbol of the original grammar, the suffix grammar
# has start symbol S'.

For example, the ES3 production

  ObjectLiteral :
    { }
    { PropertyNameAndValueList }

becomes

  ObjectLiteral' :
    { }
    }
    { PropertyNameAndValueList }
    PropertyNameAndValueList' }

So iff "var namespace", for example, can be parsed as an initial string
of the suffix grammar of ES4, then it can occur as a substring in ES4.
But:

# More generally it turns out that suffix grammars almost never have any
# redeeming properties, and that they can only be handled by general CF
# parsing. In principle this does solve the suffix parsing problem:
# construct the suffix grammar and use any general CF parsing algorithm.
# But the solution is unsatisfying: it is inefficient and does not gives
# us any insight in the nature of suffix parsing.

The book goes on to describe some more efficient (and much more
complicated) algorithms. OTOH, to answer one-off questions, who cares
if it is inefficient?

It appears that if the input grammar is a PEG, the suffix grammar can
be expressed as a PEG, so we could use a packrat parser, I think.
Alternatively, we could use a general CF parser such as
<http://accent.compilertools.net/>.

--
David-Sarah Hopwood

Felix <felix8a@gmail.com>   12 April 2008 18:13
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

David-Sarah Hopwood wrote:
> I seriously don't think the ES4 spec writers have a clue as to whether
> and how this "contextually reserved identifier" thing is going to be
> implementable. There seems to have been no analysis of how much lookahead

it's already implemented

$ ./es4
ECMAScript Edition 4 RI v0.0M2 (Fri Feb 15 13:37:13 2008)
 >> var namespace = 3
 >> namespace
**ERROR** ParseError: expecting 'identifier' before 'eof' (near <no
filename>:1:1-1.9)
 >> (namespace)
3
 >> var a = {namespace: 4}
 >>
 >> a.namespace
4
 >>

Mike Samuel <mikesamuel@gmail.com>  13 April 2008 14:30
Reply-To: mikesamuel@gmail.com
To: google-caja-discuss@googlegroups.com
On 12/04/2008, Felix <felix8a@gmail.com> wrote:
>
>  David-Sarah Hopwood wrote:
>  > I seriously don't think the ES4 spec writers have a clue as to whether
>  > and how this "contextually reserved identifier" thing is going to be
>  > implementable. There seems to have been no analysis of how much lookahead
>
>
> it's already implemented
>
>  $ ./es4
>  ECMAScript Edition 4 RI v0.0M2 (Fri Feb 15 13:37:13 2008)
>   >> var namespace = 3
>   >> namespace
>  **ERROR** ParseError: expecting 'identifier' before 'eof' (near <no
>  filename>:1:1-1.9)
>   >> (namespace)
>  3
>   >> var a = {namespace: 4}
>   >>
>   >> a.namespace
>
> 4
>   >>

Cool.  I'd forgotten they already have a reference implementation of
the interpreter.

To figure out if something is usable as an identifier, the cases that
come to mind are
 var ident;
 function ident() {}
 ident = 0;
 (ident);
 ({ ident: 0 });
 ident: while (0);
 ({}).ident;
But semicolon insertion complicates things
 ident
 --
 foo
has a different meaning than
 debugger
 --
 foo
so can we assume it doesn't also for the contextually reserved keywords?
David-Sarah Hopwood <david.hopwood@industrial-designers.co.uk>  13 April 2008 
17:57
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

Felix wrote:
> David-Sarah Hopwood wrote:
>> I seriously don't think the ES4 spec writers have a clue as to whether
>> and how this "contextually reserved identifier" thing is going to be
>> implementable. [...]
>
> it's already implemented

I stand corrected.

> $ ./es4
> ECMAScript Edition 4 RI v0.0M2 (Fri Feb 15 13:37:13 2008)
>  >> var namespace = 3
>  >> namespace
> **ERROR** ParseError: expecting 'identifier' before 'eof' (near <no
> filename>:1:1-1.9)

What is the explanation for this program not being valid? In particular,
why does semicolon insertion not occur, resulting in the valid
ExpressionStatement "namespace;"? Or is it a bug in the reference
implementation?

(One possibility, if this behaviour is intended, would be to add
"namespace" to the tokens "}" and "function" that currently cannot
start an ExpressionStatement. But that seems very ad-hoc -- and there
may be many other similar cases.)

Anyway, it seems as though Caja must disallow 'namespace' as an identifier
precisely because of this example.

--
David-Sarah Hopwood

Felix <felix8a@gmail.com>   13 April 2008 23:47
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

David-Sarah Hopwood wrote:
>> $ ./es4
>> ECMAScript Edition 4 RI v0.0M2 (Fri Feb 15 13:37:13 2008)
>>  >> var namespace = 3
>>  >> namespace
>> **ERROR** ParseError: expecting 'identifier' before 'eof' (near <no
>> filename>:1:1-1.9)
>
> What is the explanation for this program not being valid? In particular,
> why does semicolon insertion not occur, resulting in the valid
> ExpressionStatement "namespace;"? Or is it a bug in the reference
> implementation?

"namespace;" also throws the same error.  it's not clear to me if this
is a bug in the implementation or a bug in the grammar.

if I'm reading this right, the implementation diverges from the proposed
grammar in that some contextual keywords like "namespace" are matched in
the production for "Directive", and those rules have precedence over the
rule for "Statement".

Mike Samuel <mikesamuel@gmail.com>  14 April 2008 10:26
Reply-To: mikesamuel@gmail.com
To: google-caja-discuss@googlegroups.com
On 13/04/2008, Felix <felix8a@gmail.com> wrote:
>
>  David-Sarah Hopwood wrote:
>  >> $ ./es4
>  >> ECMAScript Edition 4 RI v0.0M2 (Fri Feb 15 13:37:13 2008)
>  >>  >> var namespace = 3
>  >>  >> namespace
>  >> **ERROR** ParseError: expecting 'identifier' before 'eof' (near <no
>  >> filename>:1:1-1.9)
>  >
>  > What is the explanation for this program not being valid? In particular,
>  > why does semicolon insertion not occur, resulting in the valid
>  > ExpressionStatement "namespace;"? Or is it a bug in the reference
>  > implementation?
>
>
> "namespace;" also throws the same error.  it's not clear to me if this
>  is a bug in the implementation or a bug in the grammar.
>
>  if I'm reading this right, the implementation diverges from the proposed
>  grammar in that some contextual keywords like "namespace" are matched in
>  the production for "Directive", and those rules have precedence over the
>  rule for "Statement".
>
>  so the reason disallowing "namespace" is a problem is that YUI has
>  extensive use of YAHOO.namespace('foo'), and I don't know what's a good
>  solution for that.

Some of these things like namespace, are only significant at the top
level, so maybe once the grammar settles down,
 (function () {
   namespace(4);
 })()
will work, even if
 namespace(4);
won't.

If that's the case, then getting YUI to work with namespace shouldn't
be a problem.

David-Sarah Hopwood <david.hopwood@industrial-designers.co.uk>  14 April 2008 
11:57
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

Felix wrote:
> David-Sarah Hopwood wrote:
>>> $ ./es4
>>> ECMAScript Edition 4 RI v0.0M2 (Fri Feb 15 13:37:13 2008)
>>>  >> var namespace = 3
>>>  >> namespace
>>> **ERROR** ParseError: expecting 'identifier' before 'eof' (near <no
>>> filename>:1:1-1.9)
>> What is the explanation for this program not being valid? In particular,
>> why does semicolon insertion not occur, resulting in the valid
>> ExpressionStatement "namespace;"? Or is it a bug in the reference
>> implementation?
>
> "namespace;" also throws the same error.  it's not clear to me if this
> is a bug in the implementation or a bug in the grammar.
>
> if I'm reading this right, the implementation diverges from the proposed
> grammar in that some contextual keywords like "namespace" are matched in
> the production for "Directive", and those rules have precedence over the
> rule for "Statement".

Ah. The parser in the reference implementation is ad hoc (and 7468 lines,
which seems excessive for a parser written in SML), so I'm not going to
try to understand it.

parser.sml from es4-pre-release.M2.source.tar.gz:
(*
   The parser is modeled after the design by Peter Sestoft describe here:
       www.itu.dk/courses/FDP/E2002/parsernotes.pdf
*)

> so the reason disallowing "namespace" is a problem is that YUI has
> extensive use of YAHOO.namespace('foo'), and I don't know what's a good
> solution for that.

<http://developer.yahoo.com/yui/docs/YAHOO.html#method_namespace>

That creates a namespace within the YAHOO "package", so it seems to be
intended for use by other libraries that are part of YUI (even though it
is public). Won't those libraries have to be modified to work in Caja
anyway?

--
David-Sarah Hopwood

Mike Samuel <mikesamuel@gmail.com>  15 April 2008 19:25
Reply-To: mikesamuel@gmail.com
To: google-caja-discuss@googlegroups.com
On 15/04/2008, Felix <felix8a@gmail.com> wrote:
>
>  David-Sarah Hopwood wrote:
>  > <http://developer.yahoo.com/yui/docs/YAHOO.html#method_namespace>
>  >
>  > That creates a namespace within the YAHOO "package", so it seems to be
>  > intended for use by other libraries that are part of YUI (even though it
>  > is public). Won't those libraries have to be modified to work in Caja
>  > anyway?
>
>
> YAHOO.namespace() is used extensively within YUI itself, so if caja
>  doesn't allow namespace, then in order to cajole YUI, namespace would
>  have to be renamed to something else.  the annoying bit is that it is a
>  public method, so it is used by code outside YUI.  so this would add an
>  incompatibility between cajolable YUI and normal YUI.
>
>  I can try poking the YUI people, see what they think.

It looks like
    var foo = {};
    foo.namepsace
works in the ES4 reference impl regardless of strict mode.

I think we can probably allow namespace following a '.'.  We'll
probably render it as YAHOO['namespace'] to be safe.

Would that help us avoid forking YUI?

Mike Samuel <mikesamuel@gmail.com>  16 April 2008 11:26
Reply-To: mikesamuel@gmail.com
To: google-caja-discuss@googlegroups.com
On 16/04/2008, Felix <felix8a@gmail.com> wrote:
>
>  Mike Samuel wrote:
>  > I think we can probably allow namespace following a '.'.  We'll
>  > probably render it as YAHOO['namespace'] to be safe.
>  >
>  > Would that help us avoid forking YUI?
>
>
> yes, that should be enough.
>  btw, I was surprised to find that firefox allows 'x.if = 3'
>  safari and opera barf on that.

How is override used?
Felix <felix8a@gmail.com>   16 April 2008 11:51
Reply-To: google-caja-discuss@googlegroups.com
To: google-caja-discuss@googlegroups.com

Mike Samuel wrote:
> How is override used?

override is just used as a local var and as an arg var, it can be
renamed easily without affecting

Mike Samuel <mikesamuel@gmail.com>  16 April 2008 12:10
Reply-To: mikesamuel@gmail.com
To: google-caja-discuss@googlegroups.com
On 16/04/2008, Felix <felix8a@gmail.com> wrote:

Cool.  I'm going to close bug 93 then and open one to allow namespace
as a member name in input, but not in output.

Original comment by mikesamuel@gmail.com on 16 Apr 2008 at 7:16

GoogleCodeExporter commented 9 years ago
It looks like 'namespace' is out of Harmony's contextually reserved keyword 
list. 
One of Jas's pending changes removes it from Keywords.java.

Original comment by mikesamuel@gmail.com on 21 Oct 2008 at 12:06