Multiline Expressions, Lambda Functions, and Context Assignment

TomFrost / Jexl

Javascript Expression Language: Powerful context-based expression parser and evaluator

MIT License

573 stars 94 forks source link

Multiline Expressions, Lambda Functions, and Context Assignment #3

Closed will-weiss closed 4 years ago

will-weiss commented 9 years ago

Let me preface this pull request with a note of my sincere admiration for this library. The code is gorgeous and I believe there are countless use cases for an expression language built on JS. The notion that synchronous and asynchronous transforms both read from left to right is very intuitive and expressive.

My friend and coworker Arthur Burkhart and I have pair-coded on this branch which would add multiline expressions, lambda functions, and context assignment to the library.

Incoming strings are split by the \n character with lines containing only whitespace ignored. A parser is created for each line and promises are generated for each line evaluating independently. When all lines have resolved, the result of the last line is returned.

Lambda functions are created with the -> character. The syntax is borrowed more or less from coffeescript in that variable names appear to the left in parentheses and the expression appears to the right of the -> character. The lambda function can then be used as an argument in a transform. For example,

jexl.addTransform('map', function(val, lambda) {
  return val.map(lambda);
});

jexl.eval('[1,2,3] | map((n) -> n + 2)').then(function(res) {
  console.log(res); // Output: [3,4,5]
});

Context assignment is handled with the = character. After the first identifier token on a line, the = character transitions to a state handling assignment of the identified attribute. Evaluation of the whole assignment expression results in the evaluator receiving a new context identical its prior context, but with the named attribute included. The named attribute resolves with the expression to the right up to a line break. For example,

jexl.eval('foo = 5 \n bar = 6 \n foo + bar').then(function(res) {
  console.log(res); // Output: 11
});

Arthur and I would love to know what you think.

Best, Will Weiss

TomFrost commented 9 years ago

Let me preface this pull request response by thanking you guys for the care you took to match the code style and level of documentation with your PR! That's rare to see.

This is some really fantastic work. At first I felt it made Jexl a little over-powerful; approaching the point where it has so much functionality of a programming language that its use cases might better be served by JS itself. Then I realized you could fully implement a database-driven strategy pattern with safe execution using this, and I was sold.

I want a bit more time to dig into the code, but overall I really like what you've done. The only suggestion I might make at this point would be to introduce the semicolon to split statements either instead of the line break. Splitting on each semicolon and allowing trim() to get rid of newline chars, now Jexl expressions can be stored multi-line, or, if you're working in a database/file/application where the expression must be kept to one line, the same functionality is now accessible and readable.

Will definitely comment again once I've chewed on your approach more!

will-weiss commented 9 years ago

Fantastic news! Arthur and I are happy to have a dialog about how best to proceed and hopefully integrate this functionality with an already excellent library.

will-weiss commented 9 years ago

Hi Tom,

I took your advice such that within this branch semicolons are used to split multiline expressions. To achieve this, the ; character is added to the grammar and the splitting of lines occurs within Lexer.prototype.getTokenizedLines. There, any created tokens with type semicolon are not added to the array of tokens, but instead result in that array of tokens being pushed onto an array of lines. A new array of tokens is then created and subsequent created tokens are pushed onto it.

Another change made to this branch is that the Parser constructor can take state as an argument which will override the default state 'expectFirstOperand'. This was added so that a subParser created for a subexpression can begin in the 'expectOperand' state. This is necessary because without it, context assignment could occur within a subexpression as in:

jexl.eval('3 + (bar = 5; 7)').then(function(res) {
  console.log(res); // Output: 10
});

A test has been added to ensure that expressions like those above reject. Lastly, the Lexer.prototype._isWhitespace helper function has been modified on this branch such that it returns true if the incoming string consists of only whitespace characters, not just space characters. This allows for \n characters appearing after a semicolon to be considered part of the preceding token.

I am happy to discuss any of these or earlier changes once you have had time to look them over!

Best, Will

TomFrost commented 9 years ago

Alrighty guys, my thoughts:

Awesome implementation of the semicolon -- I like how that works
Assignments could evolve quickly. Right now it's just =, but in the near future it'll probably make sense to have +=, -=, /=, the whole nine yards. It might be good to follow the same pattern as "binaryOp" in the grammar, here. All of these could be of type "binaryAssignOp", and the grammar eval function could take the left side as the identifier name and the right side as the actual value. The function could then access the context through this and do whatever it needs to do.
Assignments should evaluate to the value of the assignment. For example, pop open node or the console and type x = 4, and you'll see that not only is x set to 4, 4 is also the "result" of that statement. If you do that, that gives you the ability to support statements like a = b = c = 4. c=4 would be evaluated first, resolve to 4, and now b is set to 4, which evaluates to 4, and now a is set to 4. This should also allow you to remove the distinctions for firstIdentifier and expectFirstOperand from the states, simplifying the code a lot.
The next evolution of assignments would support a++ and ++a... which gets difficult. Definitely not a reason to hold back on this feature, but it may mean a bit of a redesign to support grammar elements that are two different types. ++ and --, for example, would both need to exist as a unaryLeftAssignOp and unaryRightAssignOp. Lots of complication. Just noting it here to keep that in mind, as my primary goal in these comments is to make sure Jexl stays lean and mean.
Super minor: the LambdaExpressionEnd state makes more sense (I think) being called LambdaExpression, since it's not specifically the end of the expression. I think that you were following the convention set by TernaryEnd here, but in that case I meant "The ending block of the ternary expression" as opposed to "the middle block of the ternary expression" ;-)

I've identified a few bugs as well. Allowing for a rightArrow during the expectOperand (and expectFirstOperand) state causes this to happen without error:

> jexl.eval('1 + -> 1', function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': [Function] }

> jexl.eval('-> 1', function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': [Function] }

In this case, an error is produced (which is good!) but it's the wrong error -- this should be caught earlier:

> jexl.eval('->', function(err, res) { console.log(arguments); });
{}
> { '0': [TypeError: Cannot read property 'bind' of undefined] }

And the way the Lexer handles negating elements with the new multi-line system, it allows -; to be tokenized:

> jexl.eval('1 + -;', function(err, res) { console.log(arguments); });
{}
> { '0': [Error: Invalid expression token: -;] }

Finally, the expectIdentifier and argName states seem like they may be incorrect; once one of these states has been entered, it's only ever possible to switch between them. It allows for nonsensical expressions like this to be evaluated:

> jexl.eval('x, y', {x:1, y:2}, function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': 1 }

Let me know your thoughts!

will-weiss commented 9 years ago

Great feedback Tom, this is very helpful! I agree with virtually all of your points. Implementation of most of the ideas you discussed and fixes for the bugs you pointed out are implemented on this experimental assignOp branch. I'll outline below my reactions and notes about what is implemented on that branch.

I wholeheartedly agree that assignment expressions should resolve with the value of the assignment as in JS. This eliminates the need to distinguish between first and subsequent identifiers and operands so those states have been removed from the assignOp branch, simplifying the states.
I agree that variable assignments are a type of operator and added the assignOp type to the grammar which handles all assignment expressions. The eval function for assign operations take the original value of the named attribute and the result of the expression to the right and return the value that the named attribute should now hold in the context.
I don't think there is much need for unary assignment operations such as ++ and -- for a few reasons. First, given that Jexl has no looping constructs, the need for this shorthand is reduced substantially. Second, Jexl's evaluation strategies generally make objects immutable because objects are never themselves manipulated unless someone adds a transform which does so. In these regards, Jexl is somewhat like Python which lacks a ++ character, has immutable integers, and favors list comprehensions over index-based for loops. Code which reads
```
foo += 1;
```
might better communicate that a new foo has been added to the context equal to the value of the previous foo plus one. However if we want to mimic JS more closely, I could see the desire to have ++ and --.
There is no real need to have lambda functions declared in an expectOperand state, so this has been eliminated. Right now, a lambda function not used as a transform argument can only use the attributes of the existing context, which is not very useful.
The expectIdentifier and argName are unnecessary and have been eliminated.
Over the course of working on the assignOp branch I realized that there needs to be an order of operations for dealing with context assignment such that they are handled in the expected way. In my initial commit, each line was evaluating independently which may have worked if the only time assignment to the context could occur was at the beginning of the expression. However, if the result of chained expressions can be added to the context, an expression such as
```
foo = bar = 5; foo
```
might return undefined as the second line may finish evaluating before foo has been added to the context in the first line. I have handled this problem such that it works, but I am not married to the solution and want your input. Effectively the solution in the branch entails keeping track of the line number while tokenizing the lines of the expression and adding a lineNo attribute to each token. An Evaluator.prototype.evalLine is used to evaluate an individual line and keep track of when that line has resolved/rejected. The evaluator/handlers for AssignmentExpression and Identifier each compare the line number of the ast they are working on to the minimum extant line number. If there is an extant line number lower than that of the ast, a timeout is set which will resolve the promise later. This seems to be necessary while we allow the same variable to be assigned to the context in multiple lines.

Will

will-weiss commented 9 years ago

An argNames state with a subhandler has been added to the experimental branch to handle the collection of argument names for a lambda expression. This allows for the elimination of a lambdaExpressionStart state and subhandler. When non-identifiers are used as argument names, the handlers throw errors, which is unique among the parser/handlers, but ensures that argument names are valid without overcrowding the states.

mythmon commented 8 years ago

Is this going anywhere? It would be nice to have a definite yes or no on this PR so we can properly evaluate Jexl.

TomFrost commented 8 years ago

I'm strongly considering this for a 2.0 release of Jexl (a branch for this was recently opened). My hangup is that Jexl works with a dumb lexer and smart parser. The lexer maps strings to parts of speech on a 1:1 basis, and the Parser intelligently interprets those parts of speech.

This introduces some annoyances like - not being able to be used as both a unary (negative sign) and binary (minus symbol), so that double-use-case had to be hardcoded into the lexer's logic. This impacts this situation far more greatly because the lexer cannot conceivably interpret { as anything other than the opening character of an object declaration, especially on the condition that it follows (arg1) => as the first part of that would be interpreted as a single identifier inside of a grouped expression.

Will and Arthur did a fantastic job of navigating this shortcoming by hacking the parser to understand different meanings from the predefined parts of speech, but the side-effect is that I haven't been able to think of an implementation of that solution that would properly report errors when those combination of parts of speech are used inappropriately, in every conceivable way to use it inappropriately.

While we could keep identifying cases where that's possible and coding around them, that would start building a house of cards for future language additions. The better approach is to make the lexer intelligent; perhaps delivering one token at a time to the parser, based on the current state of the parser's FSM. That way Will and Arthur's syntax can be implemented at the lexer level, and the parser won't need to have any conditionals added to support special cases.

The initial focus of the 2.0 branch will be on modernizing the codebase to ES6 and TechnologyAdvice's established code standards/eslint; round two is the intelligent lexer.

jtenner commented 7 years ago

I have found a workaround for this need for arrow functions.

  let { zipObject } = require('lodash');
  i.addBinaryOp("=>", Infinity, (varNames, func) => async function(...args) {
    varNames = Array.isArray(varNames) ? varNames : [varNames];
    let previousInternalContext = Object.assign({}, internalContext);
    internalContext =  Object.assign({}, internalContext, _.zipObject(varNames, args));
    let val = await i.eval(func, internalContext);
    internalContext = previousInternalContext;
    return val;
  });

Yes. This means the '=>' operator is overridden.

I have a working game with players who can store their own lists. The syntax looks like this.

//the list transform gets the "a" list from the player's cache
j.eval(`"a"|list|map("x" => "x + 1")`);
//[2,3,4,5]

Seems to work great, but it doesn't do input sanitation yet to assert the varnames are good.

Loceka commented 5 years ago

Is it still planned to be merged, since the 2.0 is now live ?

TomFrost commented 5 years ago

This one opened up years ago and still chews me up to this day.

I love the functionality, I get the use case, and I have enormous respect for Will. My lingering hesitancy is because Jexl is, per its name, a Javascript EXpression Language, and this functionality moves Jexl from parsing singular expressions to encapsulating a nearly Turing-complete programming language. Which -- don't get me wrong -- is bad ass. But I don't like that for two reasons:

Jexl goes from being a high powered expression language to a horribly underpowered programming language, and that snowball will shoot down the mountain at record-breaking speeds.
If what's necessary is a full programming language with a safe execution environment, Javascript in a sandbox is a much more powerful solution.

But, evidenced by the fact that I haven't closed this PR, after 4 years I'm still on the fence. Mostly because I've personally run into cases where this would have been helpful, and as I'm writing a custom transform to do what I need to do, I'm thinking "Well, there is a PR that would make this custom code unnecessary..." and I get re-torn all over again.

So, I'm still happy to hear use cases, or ideas on how to limit scope in a way that the extra functionality doesn't make it feel like Jexl is missing about 30 features that weren't a problem when it was still just an expression language!

Loceka commented 5 years ago

I totally get your point but since I myself decided to use Jexl in order to "emulate eval" (and since it is the only project I found that was that close to doing it), I believe it wouldn't be a bad thing at all to extend the project out of it's original scope, even if it would become an underpowered programming language instead of the powerful expression evaluator it is now.

But I do see what you mean and what tears you about changing the original aim of this project.

mythmon commented 5 years ago

For my use cases, I'm mostly opposed to this change. Multiline expressions are nice, but lambdas and context assignment strays too close to Turing-complete for my tastes. It would have been helpful a few times, but we chose Jexl specifically because it was a fairly weak language that didn't have the ability to inherently take an arbitrary long amount of time to evaluate.

TomFrost commented 4 years ago

That's great input-- thank you! @mythmon you raise a great point I haven't considered before on this topic.

I'm going to go ahead and close this, not because I'm any less torn, but because even if this turns out to be the future for Jexl, the codebase has strayed so far from this PR that a fresh write might be in the cards.