TomFrost / Jexl

Javascript Expression Language: Powerful context-based expression parser and evaluator
MIT License
573 stars 94 forks source link

Multiline Expressions, Lambda Functions, and Context Assignment #3

Closed will-weiss closed 4 years ago

will-weiss commented 9 years ago

Let me preface this pull request with a note of my sincere admiration for this library. The code is gorgeous and I believe there are countless use cases for an expression language built on JS. The notion that synchronous and asynchronous transforms both read from left to right is very intuitive and expressive.

My friend and coworker Arthur Burkhart and I have pair-coded on this branch which would add multiline expressions, lambda functions, and context assignment to the library.

Incoming strings are split by the \n character with lines containing only whitespace ignored. A parser is created for each line and promises are generated for each line evaluating independently. When all lines have resolved, the result of the last line is returned.

Lambda functions are created with the -> character. The syntax is borrowed more or less from coffeescript in that variable names appear to the left in parentheses and the expression appears to the right of the -> character. The lambda function can then be used as an argument in a transform. For example,

jexl.addTransform('map', function(val, lambda) {
  return val.map(lambda);
});

jexl.eval('[1,2,3] | map((n) -> n + 2)').then(function(res) {
  console.log(res); // Output: [3,4,5]
});

Context assignment is handled with the = character. After the first identifier token on a line, the = character transitions to a state handling assignment of the identified attribute. Evaluation of the whole assignment expression results in the evaluator receiving a new context identical its prior context, but with the named attribute included. The named attribute resolves with the expression to the right up to a line break. For example,

jexl.eval('foo = 5 \n bar = 6 \n foo + bar').then(function(res) {
  console.log(res); // Output: 11
});

Arthur and I would love to know what you think.

Best, Will Weiss

TomFrost commented 9 years ago

Let me preface this pull request response by thanking you guys for the care you took to match the code style and level of documentation with your PR! That's rare to see.

This is some really fantastic work. At first I felt it made Jexl a little over-powerful; approaching the point where it has so much functionality of a programming language that its use cases might better be served by JS itself. Then I realized you could fully implement a database-driven strategy pattern with safe execution using this, and I was sold.

I want a bit more time to dig into the code, but overall I really like what you've done. The only suggestion I might make at this point would be to introduce the semicolon to split statements either instead of the line break. Splitting on each semicolon and allowing trim() to get rid of newline chars, now Jexl expressions can be stored multi-line, or, if you're working in a database/file/application where the expression must be kept to one line, the same functionality is now accessible and readable.

Will definitely comment again once I've chewed on your approach more!

will-weiss commented 9 years ago

Fantastic news! Arthur and I are happy to have a dialog about how best to proceed and hopefully integrate this functionality with an already excellent library.

will-weiss commented 9 years ago

Hi Tom,

I took your advice such that within this branch semicolons are used to split multiline expressions. To achieve this, the ; character is added to the grammar and the splitting of lines occurs within Lexer.prototype.getTokenizedLines. There, any created tokens with type semicolon are not added to the array of tokens, but instead result in that array of tokens being pushed onto an array of lines. A new array of tokens is then created and subsequent created tokens are pushed onto it.

Another change made to this branch is that the Parser constructor can take state as an argument which will override the default state 'expectFirstOperand'. This was added so that a subParser created for a subexpression can begin in the 'expectOperand' state. This is necessary because without it, context assignment could occur within a subexpression as in:

jexl.eval('3 + (bar = 5; 7)').then(function(res) {
  console.log(res); // Output: 10
});

A test has been added to ensure that expressions like those above reject. Lastly, the Lexer.prototype._isWhitespace helper function has been modified on this branch such that it returns true if the incoming string consists of only whitespace characters, not just space characters. This allows for \n characters appearing after a semicolon to be considered part of the preceding token.

I am happy to discuss any of these or earlier changes once you have had time to look them over!

Best, Will

TomFrost commented 9 years ago

Alrighty guys, my thoughts:

I've identified a few bugs as well. Allowing for a rightArrow during the expectOperand (and expectFirstOperand) state causes this to happen without error:

> jexl.eval('1 + -> 1', function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': [Function] }

> jexl.eval('-> 1', function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': [Function] }

In this case, an error is produced (which is good!) but it's the wrong error -- this should be caught earlier:

> jexl.eval('->', function(err, res) { console.log(arguments); });
{}
> { '0': [TypeError: Cannot read property 'bind' of undefined] }

And the way the Lexer handles negating elements with the new multi-line system, it allows -; to be tokenized:

> jexl.eval('1 + -;', function(err, res) { console.log(arguments); });
{}
> { '0': [Error: Invalid expression token: -;] }

Finally, the expectIdentifier and argName states seem like they may be incorrect; once one of these states has been entered, it's only ever possible to switch between them. It allows for nonsensical expressions like this to be evaluated:

> jexl.eval('x, y', {x:1, y:2}, function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': 1 }

Let me know your thoughts!

will-weiss commented 9 years ago

Great feedback Tom, this is very helpful! I agree with virtually all of your points. Implementation of most of the ideas you discussed and fixes for the bugs you pointed out are implemented on this experimental assignOp branch. I'll outline below my reactions and notes about what is implemented on that branch.

Will

will-weiss commented 9 years ago

An argNames state with a subhandler has been added to the experimental branch to handle the collection of argument names for a lambda expression. This allows for the elimination of a lambdaExpressionStart state and subhandler. When non-identifiers are used as argument names, the handlers throw errors, which is unique among the parser/handlers, but ensures that argument names are valid without overcrowding the states.

mythmon commented 8 years ago

Is this going anywhere? It would be nice to have a definite yes or no on this PR so we can properly evaluate Jexl.

TomFrost commented 8 years ago

I'm strongly considering this for a 2.0 release of Jexl (a branch for this was recently opened). My hangup is that Jexl works with a dumb lexer and smart parser. The lexer maps strings to parts of speech on a 1:1 basis, and the Parser intelligently interprets those parts of speech.

This introduces some annoyances like - not being able to be used as both a unary (negative sign) and binary (minus symbol), so that double-use-case had to be hardcoded into the lexer's logic. This impacts this situation far more greatly because the lexer cannot conceivably interpret { as anything other than the opening character of an object declaration, especially on the condition that it follows (arg1) => as the first part of that would be interpreted as a single identifier inside of a grouped expression.

Will and Arthur did a fantastic job of navigating this shortcoming by hacking the parser to understand different meanings from the predefined parts of speech, but the side-effect is that I haven't been able to think of an implementation of that solution that would properly report errors when those combination of parts of speech are used inappropriately, in every conceivable way to use it inappropriately.

While we could keep identifying cases where that's possible and coding around them, that would start building a house of cards for future language additions. The better approach is to make the lexer intelligent; perhaps delivering one token at a time to the parser, based on the current state of the parser's FSM. That way Will and Arthur's syntax can be implemented at the lexer level, and the parser won't need to have any conditionals added to support special cases.

The initial focus of the 2.0 branch will be on modernizing the codebase to ES6 and TechnologyAdvice's established code standards/eslint; round two is the intelligent lexer.

jtenner commented 7 years ago

I have found a workaround for this need for arrow functions.

  let { zipObject } = require('lodash');
  i.addBinaryOp("=>", Infinity, (varNames, func) => async function(...args) {
    varNames = Array.isArray(varNames) ? varNames : [varNames];
    let previousInternalContext = Object.assign({}, internalContext);
    internalContext =  Object.assign({}, internalContext, _.zipObject(varNames, args));
    let val = await i.eval(func, internalContext);
    internalContext = previousInternalContext;
    return val;
  });

Yes. This means the '=>' operator is overridden.

I have a working game with players who can store their own lists. The syntax looks like this.

//the list transform gets the "a" list from the player's cache
j.eval(`"a"|list|map("x" => "x + 1")`);
//[2,3,4,5]

Seems to work great, but it doesn't do input sanitation yet to assert the varnames are good.

Loceka commented 5 years ago

Is it still planned to be merged, since the 2.0 is now live ?

TomFrost commented 5 years ago

This one opened up years ago and still chews me up to this day.

I love the functionality, I get the use case, and I have enormous respect for Will. My lingering hesitancy is because Jexl is, per its name, a Javascript EXpression Language, and this functionality moves Jexl from parsing singular expressions to encapsulating a nearly Turing-complete programming language. Which -- don't get me wrong -- is bad ass. But I don't like that for two reasons:

  1. Jexl goes from being a high powered expression language to a horribly underpowered programming language, and that snowball will shoot down the mountain at record-breaking speeds.
  2. If what's necessary is a full programming language with a safe execution environment, Javascript in a sandbox is a much more powerful solution.

But, evidenced by the fact that I haven't closed this PR, after 4 years I'm still on the fence. Mostly because I've personally run into cases where this would have been helpful, and as I'm writing a custom transform to do what I need to do, I'm thinking "Well, there is a PR that would make this custom code unnecessary..." and I get re-torn all over again.

So, I'm still happy to hear use cases, or ideas on how to limit scope in a way that the extra functionality doesn't make it feel like Jexl is missing about 30 features that weren't a problem when it was still just an expression language!

Loceka commented 5 years ago

I totally get your point but since I myself decided to use Jexl in order to "emulate eval" (and since it is the only project I found that was that close to doing it), I believe it wouldn't be a bad thing at all to extend the project out of it's original scope, even if it would become an underpowered programming language instead of the powerful expression evaluator it is now.

But I do see what you mean and what tears you about changing the original aim of this project.

mythmon commented 5 years ago

For my use cases, I'm mostly opposed to this change. Multiline expressions are nice, but lambdas and context assignment strays too close to Turing-complete for my tastes. It would have been helpful a few times, but we chose Jexl specifically because it was a fairly weak language that didn't have the ability to inherently take an arbitrary long amount of time to evaluate.

TomFrost commented 4 years ago

That's great input-- thank you! @mythmon you raise a great point I haven't considered before on this topic.

I'm going to go ahead and close this, not because I'm any less torn, but because even if this turns out to be the future for Jexl, the codebase has strayed so far from this PR that a fresh write might be in the cards.