What are you trying investigate

bd82 commented 7 years ago

Hello.

I saw this commit: https://github.com/FlamingTempura/zam/commit/77cf7ceb0c0179cc2387f841ec879db38388368e

And I am not sure what you are exploring and what are the requirements for your parser?

FlamingTempura commented 7 years ago

Thanks for your interest!

I was looking for a Javascript expression parser which can produce an abstract syntax tree. The current parser is generated based on a PEG.js grammar and works well, but the generated parser is large and parsing errors are handled poorly. Potentially this could be swapped out with an alternative parser which finds a better compromise between speed and size:

acorn - which performs well, and generates AST, but includes unneeded language features so is quite large (such as import/export)
jsep - which is very small, and generates AST, but does not include needed language features like object literals
chevrotain - which may perform better than PEG.js and result in less code, though may not perform as well as purpose-built JS parsers like acorn

Chevrotain could be the best way forward but could require a substantial amount of work to develop a parser which produces AST, unless somebody is already working on such a thing.

bd82 commented 7 years ago

Interesting.

I understand you are talking about pure expressions? not the full syntax of ECMAScript?

bd82 commented 7 years ago

And when you say "error handling" is automatic error recovery / fault tolerance also relevant?

FlamingTempura commented 7 years ago

Yes, for the purpose of this project it is only necessary to parse expressions, such as:

 1+1
 1 + x.y
 1 + arr.reduce((m,x) => m+ x * 2, 0)
 { a: 1 }.a * 2

This is used to parse expressions within the DOM. E.g.:

<div>{{ 1 + 1 }}</div> <!-- compiles to <div>2</div> -->

So the full ECMAScript specification does not need to be implemented.

In terms of error handling, the PEG.js grammar results in faily complex and non human readable errors. e.g.: parse('1a'); results in the error Expected "!=", "!==", "%=", "&&", "(", "*=", "++", "+=", "--", "-=", ".", "/=", "<", "<=", "=", "==", "===", ">", ">=", "?", "[", "||", [*/%], [+\-], [\t ], or end of input but "a" found.

Conversely, typing 1a into Chrome dev console results into SyntaxError: Invalid or unexpected token, which is a bit more readable.

I'm not sure how well PEG.js can be tamed into giving more meaningful error messages like Chrome does, but a JS parser (such as acorn) might already provide such a functionality, or chevrotain may give a nice abstraction for defining meaningful errors.

bd82 commented 7 years ago

O.k thanks for the explanation. I would investigate existing hand crafted JavaScript parser libraries such as Acron/Esprima/Babylon/Cherow/...

You may find one that provides good enough error messages or hooks to customize the error messages.

The problem is that as you said those hand crafted parsers support a superset of the syntax you require so in the easy case that could cause outputting ASTs which you do not support. That may be resolved by scanning the output AST and creating "pesudo" syntax errors for unsupported syntax.

I would only suggest to build something yourself (either hand crafted or using a parsing library such as Chevrotain/Pegjs) if you find no alternative.

BTW, could you just use eval to get the value of the expression? Or do you really need the AST?

FlamingTempura commented 7 years ago

Eval might be possible, but I had abandoned use of eval because a couple of major issues:

The parser must implement block scoping. While this may be possible in eval, the techniques to achieve it are generally considered bad form. For example, with blocks could be used:

let value;
with (scopeA) {
    with (scopeB) {
        eval(`value = ${expression}`)
    }   
}

However, with blocks are forbidden within strict mode. eval'd variable definitions could be used instead:

let pre = '',
    post = '';
Object.keys(scopeA).forEach(k => {
    pre += `{ let ${k} = scopeA["${k}"];`;
    Object.keys(scopeB).forEach(k => {
        pre += `{ let ${k} = scopeB["${k}"];`;
        post += `scopeB["${k}"] = ${k}; }`;
    });
    post += `scopeA["${k}"] = ${k}; }`;
});
eval(`
    ${pre}
    value = ${expression};
    ${post}
`);

As you can see, this is an ugly solution.

eval fails on finding an undefined variable, whereas the desired behaviour is to just use the undefined value. E.g
```
eval('"hello " + a') // throws exception because a is undefined
```
the desirable behaviour would be to return 'hello '.

For these reasons, it makes sense to work with an AST instead, so that I can manually define how variables should be retrieved from block scopes.

I think you're right that a hand crafted parser would the way to go, though I would be inclined to find ways of reducing it to the required subset of the language, for the sake of filesize.

One other reason I wrote a custom PEG.js grammar is because I can define how expressions are parsed in HTML. i.e. the parser parses the following:

Name: {{ firstName }} {{{ lastName }}}

The AST will be:

[  
  "Name: ",
  {  
    "html":false,
    "expression":{  
      "type":"Identifier",
      "name":"firstName"
    }
  },
  " ",
  {  
    "html":true,
    "expression":{  
      "type":"Identifier",
      "name":"lastName"
    }
  }
]

bd82 commented 7 years ago

An hand crafted parser is the most versatile in theory. But because you are not the one writing it may not be as versatile as you need...

Acorn has a plugin systems which can be used to add support for additional syntax. So some of the parsing rules can be overwritten. Perhaps you could use the same plugin system to actually reduce the amount of syntax supported and maybe combine this with some tree shaking bundler to actually only package the syntax rules relevant to you so decreasing the size of the module.

FlamingTempura / zam

What are you trying investigate #1