Closed will-weiss closed 4 years ago
Let me preface this pull request response by thanking you guys for the care you took to match the code style and level of documentation with your PR! That's rare to see.
This is some really fantastic work. At first I felt it made Jexl a little over-powerful; approaching the point where it has so much functionality of a programming language that its use cases might better be served by JS itself. Then I realized you could fully implement a database-driven strategy pattern with safe execution using this, and I was sold.
I want a bit more time to dig into the code, but overall I really like what you've done. The only suggestion I might make at this point would be to introduce the semicolon to split statements either instead of the line break. Splitting on each semicolon and allowing trim() to get rid of newline chars, now Jexl expressions can be stored multi-line, or, if you're working in a database/file/application where the expression must be kept to one line, the same functionality is now accessible and readable.
Will definitely comment again once I've chewed on your approach more!
Fantastic news! Arthur and I are happy to have a dialog about how best to proceed and hopefully integrate this functionality with an already excellent library.
Hi Tom,
I took your advice such that within this branch semicolons are used to split multiline expressions. To achieve this, the ;
character is added to the grammar and the splitting of lines occurs within Lexer.prototype.getTokenizedLines
. There, any created tokens with type semicolon are not added to the array of tokens, but instead result in that array of tokens being pushed onto an array of lines. A new array of tokens is then created and subsequent created tokens are pushed onto it.
Another change made to this branch is that the Parser
constructor can take state
as an argument which will override the default state 'expectFirstOperand'
. This was added so that a subParser created for a subexpression can begin in the 'expectOperand'
state. This is necessary because without it, context assignment could occur within a subexpression as in:
jexl.eval('3 + (bar = 5; 7)').then(function(res) {
console.log(res); // Output: 10
});
A test has been added to ensure that expressions like those above reject. Lastly, the Lexer.prototype._isWhitespace
helper function has been modified on this branch such that it returns true if the incoming string consists of only whitespace characters, not just space characters. This allows for \n
characters appearing after a semicolon to be considered part of the preceding token.
I am happy to discuss any of these or earlier changes once you have had time to look them over!
Best, Will
Alrighty guys, my thoughts:
=
, but in the near future it'll probably make sense to have +=
, -=
, /=
, the whole nine yards. It might be good to follow the same pattern as "binaryOp" in the grammar, here. All of these could be of type "binaryAssignOp", and the grammar eval
function could take the left side as the identifier name and the right side as the actual value. The function could then access the context through this
and do whatever it needs to do.x = 4
, and you'll see that not only is x
set to 4
, 4
is also the "result" of that statement. If you do that, that gives you the ability to support statements like a = b = c = 4
. c=4 would be evaluated first, resolve to 4, and now b
is set to 4, which evaluates to 4, and now a
is set to 4. This should also allow you to remove the distinctions for firstIdentifier
and expectFirstOperand
from the states, simplifying the code a lot.a++
and ++a
... which gets difficult. Definitely not a reason to hold back on this feature, but it may mean a bit of a redesign to support grammar elements that are two different types. ++
and --
, for example, would both need to exist as a unaryLeftAssignOp and unaryRightAssignOp. Lots of complication. Just noting it here to keep that in mind, as my primary goal in these comments is to make sure Jexl stays lean and mean.LambdaExpressionEnd
state makes more sense (I think) being called LambdaExpression
, since it's not specifically the end of the expression. I think that you were following the convention set by TernaryEnd
here, but in that case I meant "The ending block of the ternary expression" as opposed to "the middle block of the ternary expression" ;-)I've identified a few bugs as well. Allowing for a rightArrow during the expectOperand (and expectFirstOperand) state causes this to happen without error:
> jexl.eval('1 + -> 1', function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': [Function] }
> jexl.eval('-> 1', function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': [Function] }
In this case, an error is produced (which is good!) but it's the wrong error -- this should be caught earlier:
> jexl.eval('->', function(err, res) { console.log(arguments); });
{}
> { '0': [TypeError: Cannot read property 'bind' of undefined] }
And the way the Lexer handles negating elements with the new multi-line system, it allows -;
to be tokenized:
> jexl.eval('1 + -;', function(err, res) { console.log(arguments); });
{}
> { '0': [Error: Invalid expression token: -;] }
Finally, the expectIdentifier
and argName
states seem like they may be incorrect; once one of these states has been entered, it's only ever possible to switch between them. It allows for nonsensical expressions like this to be evaluated:
> jexl.eval('x, y', {x:1, y:2}, function(err, res) { console.log(arguments); });
{}
> { '0': null, '1': 1 }
Let me know your thoughts!
Great feedback Tom, this is very helpful! I agree with virtually all of your points. Implementation of most of the ideas you discussed and fixes for the bugs you pointed out are implemented on this experimental assignOp branch. I'll outline below my reactions and notes about what is implemented on that branch.
++
and --
for a few reasons. First, given that Jexl has no looping constructs, the need for this shorthand is reduced substantially. Second, Jexl's evaluation strategies generally make objects immutable because objects are never themselves manipulated unless someone adds a transform which does so. In these regards, Jexl is somewhat like Python which lacks a ++
character, has immutable integers, and favors list comprehensions over index-based for loops. Code which reads foo += 1;
might better communicate that a new foo
has been added to the context equal to the value of the previous foo plus one. However if we want to mimic JS more closely, I could see the desire to have ++
and --
.foo = bar = 5; foo
might return undefined as the second line may finish evaluating before foo has been added to the context in the first line. I have handled this problem such that it works, but I am not married to the solution and want your input. Effectively the solution in the branch entails keeping track of the line number while tokenizing the lines of the expression and adding a lineNo
attribute to each token. An Evaluator.prototype.evalLine
is used to evaluate an individual line and keep track of when that line has resolved/rejected. The evaluator/handlers for AssignmentExpression
and Identifier
each compare the line number of the ast they are working on to the minimum extant line number. If there is an extant line number lower than that of the ast, a timeout is set which will resolve the promise later. This seems to be necessary while we allow the same variable to be assigned to the context in multiple lines. Will
An argNames
state with a subhandler has been added to the experimental branch to handle the collection of argument names for a lambda expression. This allows for the elimination of a lambdaExpressionStart
state and subhandler. When non-identifiers are used as argument names, the handlers throw errors, which is unique among the parser/handlers, but ensures that argument names are valid without overcrowding the states.
Is this going anywhere? It would be nice to have a definite yes or no on this PR so we can properly evaluate Jexl.
I'm strongly considering this for a 2.0 release of Jexl (a branch for this was recently opened). My hangup is that Jexl works with a dumb lexer and smart parser. The lexer maps strings to parts of speech on a 1:1 basis, and the Parser intelligently interprets those parts of speech.
This introduces some annoyances like -
not being able to be used as both a unary (negative sign) and binary (minus symbol), so that double-use-case had to be hardcoded into the lexer's logic. This impacts this situation far more greatly because the lexer cannot conceivably interpret { as anything other than the opening character of an object declaration, especially on the condition that it follows (arg1) =>
as the first part of that would be interpreted as a single identifier inside of a grouped expression.
Will and Arthur did a fantastic job of navigating this shortcoming by hacking the parser to understand different meanings from the predefined parts of speech, but the side-effect is that I haven't been able to think of an implementation of that solution that would properly report errors when those combination of parts of speech are used inappropriately, in every conceivable way to use it inappropriately.
While we could keep identifying cases where that's possible and coding around them, that would start building a house of cards for future language additions. The better approach is to make the lexer intelligent; perhaps delivering one token at a time to the parser, based on the current state of the parser's FSM. That way Will and Arthur's syntax can be implemented at the lexer level, and the parser won't need to have any conditionals added to support special cases.
The initial focus of the 2.0 branch will be on modernizing the codebase to ES6 and TechnologyAdvice's established code standards/eslint; round two is the intelligent lexer.
I have found a workaround for this need for arrow functions.
let { zipObject } = require('lodash');
i.addBinaryOp("=>", Infinity, (varNames, func) => async function(...args) {
varNames = Array.isArray(varNames) ? varNames : [varNames];
let previousInternalContext = Object.assign({}, internalContext);
internalContext = Object.assign({}, internalContext, _.zipObject(varNames, args));
let val = await i.eval(func, internalContext);
internalContext = previousInternalContext;
return val;
});
Yes. This means the '=>' operator is overridden.
I have a working game with players who can store their own lists. The syntax looks like this.
//the list transform gets the "a" list from the player's cache
j.eval(`"a"|list|map("x" => "x + 1")`);
//[2,3,4,5]
Seems to work great, but it doesn't do input sanitation yet to assert the varnames are good.
Is it still planned to be merged, since the 2.0 is now live ?
This one opened up years ago and still chews me up to this day.
I love the functionality, I get the use case, and I have enormous respect for Will. My lingering hesitancy is because Jexl is, per its name, a Javascript EXpression Language, and this functionality moves Jexl from parsing singular expressions to encapsulating a nearly Turing-complete programming language. Which -- don't get me wrong -- is bad ass. But I don't like that for two reasons:
But, evidenced by the fact that I haven't closed this PR, after 4 years I'm still on the fence. Mostly because I've personally run into cases where this would have been helpful, and as I'm writing a custom transform to do what I need to do, I'm thinking "Well, there is a PR that would make this custom code unnecessary..." and I get re-torn all over again.
So, I'm still happy to hear use cases, or ideas on how to limit scope in a way that the extra functionality doesn't make it feel like Jexl is missing about 30 features that weren't a problem when it was still just an expression language!
I totally get your point but since I myself decided to use Jexl in order to "emulate eval" (and since it is the only project I found that was that close to doing it), I believe it wouldn't be a bad thing at all to extend the project out of it's original scope, even if it would become an underpowered programming language instead of the powerful expression evaluator it is now.
But I do see what you mean and what tears you about changing the original aim of this project.
For my use cases, I'm mostly opposed to this change. Multiline expressions are nice, but lambdas and context assignment strays too close to Turing-complete for my tastes. It would have been helpful a few times, but we chose Jexl specifically because it was a fairly weak language that didn't have the ability to inherently take an arbitrary long amount of time to evaluate.
That's great input-- thank you! @mythmon you raise a great point I haven't considered before on this topic.
I'm going to go ahead and close this, not because I'm any less torn, but because even if this turns out to be the future for Jexl, the codebase has strayed so far from this PR that a fresh write might be in the cards.
Let me preface this pull request with a note of my sincere admiration for this library. The code is gorgeous and I believe there are countless use cases for an expression language built on JS. The notion that synchronous and asynchronous transforms both read from left to right is very intuitive and expressive.
My friend and coworker Arthur Burkhart and I have pair-coded on this branch which would add multiline expressions, lambda functions, and context assignment to the library.
Incoming strings are split by the
\n
character with lines containing only whitespace ignored. A parser is created for each line and promises are generated for each line evaluating independently. When all lines have resolved, the result of the last line is returned.Lambda functions are created with the
->
character. The syntax is borrowed more or less from coffeescript in that variable names appear to the left in parentheses and the expression appears to the right of the->
character. The lambda function can then be used as an argument in a transform. For example,Context assignment is handled with the
=
character. After the first identifier token on a line, the=
character transitions to a state handling assignment of the identified attribute. Evaluation of the whole assignment expression results in the evaluator receiving a new context identical its prior context, but with the named attribute included. The named attribute resolves with the expression to the right up to a line break. For example,Arthur and I would love to know what you think.
Best, Will Weiss