dolphinsmalltalk / Dolphin

Dolphin Smalltalk Core Image
MIT License
301 stars 58 forks source link

StOptimizedNodes should know their methodClass when created by parsing a method #1110

Closed daniels220 closed 3 years ago

daniels220 commented 3 years ago

It seems reasonable to me that, if I ask a method for its parseTree and it contains optimized nodes, said nodes be immediately ready to answer to #value, for which they need to know the methodClass they should be evaluated in. The obvious thing is to just do node := self optimizedNodeClass new methodClass: methodClass. in SmalltalkParser>>parseOptimizedExpression. Or, there's a part of me that thinks optimized nodes should just make a best-effort attempt to produce a value no matter what, rather than always answering nil if a methodClass isn't set. Or, a more precise option, scan the body for references to self and complain if there are any and the methodClass isn't set, but happily evaluate e.g. ##(60 * 60) even if the methodClass is missing.

blairmcg commented 3 years ago

It is by-design that the parse tree is not in any sense evaluable as emitted from the parser. None of the variable references will be bound, for example. There are a number of reasons for this, including clean architectural layering (the parser only does lexical and syntax analysis). Most importantly just parsing the code should not evaluate, or risk accidentally evaluating, any code. Starting up an old copy of Dolphin, e.g. X6.1, and entering an expression like ##(Object become: Object new) into the text of a method will quickly demonstrate how evaluating the static expressions might damage the system, either due to malicious intent, or (more likely) accident, so the first level of code analysis and semantic analysis in the browsers no longer does this and should never have done so.
It would be possible to do some analysis to detect safe expressions that have no side effects. Such analysis would have some uses, but I'm not sure that providing a tree with static expressions in it that sometimes evaluate to a value and sometimes don't is necessarily that useful. It would certainly be inconsistent. You don't mention your scenario, but in general if you want to "bind" the code to its environment, then you must run the semantic analyser over it by sending the method node the analyze message. There would be an argument for setting the methodClass of the static expression nodes as part of this pass, but out of caution it has been deliberately left out in order to ensure evaluation is deferred until it is definitely needed, e.g. at code generation time in a future compiler back end. It wouldn't be difficult to add this in a subclass of StSemanticAnalyser though, or even write a special case parse node visitor. There are a number of examples in the image already. BTW: There are some issues with the parse tree construction not being completely linked, mainly in the area of FFI type descriptors. These are (or will be shortly) fixed in master, and I may back port these to 7.1 since this causes some refactoring and parse-tree search operations to not work correctly. For example renaming a structure class does not rename any references from FFI call tags.

daniels220 commented 3 years ago

Interesting, that makes sense. I do remember when you fixed the bug you mention, where just typing a dangerous static expression would evaluate it, with whatever consequences that entailed. Typing ##(10^10 <displayIt> was an amusing way to hang my image once, but not desirable behavior overall, I admit 😄 . So...I wish there were a clever solution to this that would handle all cases reasonably well, but, I guess I'll probably just have to work around it myself? Some further thoughts in case they're helpful:

My use-case is a pattern wrapper block in a Code Rewriter expression that wants to examine the value of a static expression rather than its parse tree, like ##(`@exp) `{:stat | stat value isNumber}. This is logically operating more at the level of the semantic analyzer, but it's not dependent on the SA so there's no (other) reason for it to actually run the SA. Having the node always return nil is a particularly nasty failure mode because some inquiries I might make, nil will happily answer to (rather than giving a DNU), but it's the wrong answer. If I were writing things from scratch I might have it throw an error instead, but given existing uses I'm guessing that would result in, well, throwing a lot of errors.

In this particular case, of parsing a method that has already been compiled, there shouldn't be any danger with evaluating static expressions, since they have already been evaluated to compile the method. But I admit it's ugly to communicate that to the parser in all the places it might come up—it's additional information to thread through a bunch of method calls—and, like with your speculation about using static analysis to decide whether a particular expression is "safe", it may create an inconsistency, which I agree is problematic.

This ship has probably sailed, but: It seems like a big part of the problem here is the massively overloaded nature of #value, such that it is difficult to quickly review exactly where an optimized node may be asked for its value. There would be less need for this kind of defensiveness if a simple Browse References could show exactly where it might come up, and having a different name (along with a comment in the method itself) might also clue in anyone using it that they should consider carefully whether/when to do so.

blairmcg commented 3 years ago

I agree with what you say, although it may be that the overall mistake was in inserting StOptimizedNode into the hierarchy as a kind of ~literal~ value node, and thereby implicitly promising it can serve up a value at essentially zero-cost/side-effect, which in the general case it cannot. With that in mind a reasonable solution might be to add another public method on StOptimizedNode, with an appropriate intention-revealing selector, that does what you want and evaluates the expression to get its value. A little refactoring is needed so that the guard conditions for lazy evaluation in value and the guard against inadvertent evaluation in evaluateStatements are in the correct place to avoid code repetition, and so that the methodClass can be retrieved from the outer StMethodNode and passed as an argument to evaluateStatements:. That should work for your scenario? Another approach that could be used if you are controlling the supply of parse trees, and which doesn't require modifying the parser node hierarchy, would be to write a very simple parse tree visitor that copies down optimizedNode methodNode methodClass into optimized nodes when visiting them. I would expect it to run pretty quickly.

daniels220 commented 3 years ago

Honestly, for my purposes, I'm probably just going to manually copy the methodClass in the pattern wrapper block that I'm using anyway, e.g. ##(`@exp) `{:opt | opt methodClass: opt methodNode methodClass. opt value isBehavior}. (If I remember correctly, the optimized node already handles always using the instanceClass as the actual evaluation context, so it's arguably more correct—and certainly shorter—to just copy what the method node has.) As a broader solution, I'd probably go with the intention-revealing selector, but, I'm having a hard time coming up with what that selector should be.