dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

Spurious reference to function literals in 'constant context' #3619

Open eernstg opened 6 months ago

eernstg commented 6 months ago

Thanks to @sgrekhov for noticing this!

The definition of what it means to be in a constant context includes a case involving a function literal, here.

However, the situation which is described there can never occur: It is always a compile-time error for a function literal to occur in a constant context, because a function literal is never a constant expression.

This part of the rule is only relevant if we introduce a notion of constant function literals (see https://github.com/dart-lang/language/issues/1048 for a proposal ... from 2012 ... with about 250 supporting emojis).

We may wish to eliminate this part of the 'constant context' rule (or we may wish to add constant function literals). In any case, the current situation is somewhat confusing.

lrhn commented 6 months ago

I think it's not wrong, and it's deliberately to ensure that a constant context does not cross into a nested function. The fact that a nested function is a compile-time error for other reasons doesn't change that.

A const x = [(x) => [x.toString()]]; will be a compile time error because (x) => [x.toString()] is not itself a constant expression, but not also because [x.toString()] occurs in a constant context and is not constant. Because with this clause that's not a constant context. Without the clause, it would be.

I'm not sure being minimal here is desirable.

One could argue that we should add something like "immediate subexpressions of an expression in a constant context which is itself a constant expression", but that gets us into cyclical issues, because for const [[[]]], the nested [[]] is definitely in constant context, but it's only a constant expression if the inner [] is, what it only is of it's in a constant context. (Aka, this is hard because our definition of constant expression is loose, so the definition of constant context must be equally strict.)

eernstg commented 6 months ago

I'm not sure being minimal here is desirable.

Makes sense. However, we don't usually specify that a source code construct with an error has any particular properties, we'd usually say/imply that it isn't relevant whether or not /*1*/ is in a constant context in const [() { /*1*/ }], because const [() {...}] is a compile-time error already, no matter what that ... stands for.

I'd recommend that we change the normative text 'where e0 is not a function literal (17.11)' to a TODO comment. This ensures that we will have a reminder in case we do add support for constant function literals (we already have another one about throw at the same location), and it allows us to avoid the confusing reference to the internals of an expression which is already known to be a compile-time error.

lrhn commented 6 months ago

I'd like to introduce the notion of "constant evaluation", as a variant of evaluation that happens in a constant context. The main reason is to be specific about where it happens, and to explicitly make throwing a compile-time error, rather than having the "if evaluation of this expression would throw, it's a compile-time error", which is workable, but a little too indirect. (How do we know if something would throw without evaluating it? If we do evaluate it, why do we say "would throw" instead of "throws"? When are constants evaluated at all, and what does that even mean?)

If we do that, an expression in a constant context is evauated using constant evaluation semantics, which is like normal semantics, except that it can end in either a value or a compile-time error. A lot of expression shapes directly evaluate to a compile-time error.

Actually, I think I'd want four modes:

In a required constant context (optional parameter default values, final field initializer of class with constant constructor), collection literals and object creation expressions are not automatically made constant, like in a constant context. In a potentially constant context, parameters and type parameters of a surrounding constructor can be accessed too.

Then it's not so much that an expression is in a "constant context", but that the expression is evaluated using constant evaluation, which automatically evaluates subexpressions using constant evaluation as well, if the subexpression can be a constant expression. That evaluation will fail when reaching a function literal, and therefore it will never try to evaluate the body in any context. We lose the separation between "how must this be evaluated" and "how is this evaluated".

In a normal evaluation context, obvious constants should still be canonicalized (simple literals, type literals with constant type arguments, static/top-level tear-offs, constant instantiations of those). Which mean we still do recognize some inherent constants, as part of normal evaluation, like an extra bit on the static type of an expression: "is constant value", and normal evaluation of e1 + e2 where e1 and e2 evaluate to the constant values v1 and v2 of type int, will evaluate to the constant value v1+v2. And then we can say that constant String values are canonicalized.

That would mean that an expression like "foo" + "bar" in a completely normal context could still be canonicalized, because it's adding compile-time-known String values. So both a downwards constant requirement in evaluation and an upwards constant inference/propagation.

(I think it's only really strings where we might not canonicalize all possible constant strings today, which is acceptable because the spec still doesn't explicitly require constant strings to be canonicalized. It just requires constant identical on them to be true, but that only applies to string values that can occur in a constant identical check.)

eernstg commented 6 months ago

@lrhn, I really appreciate the urge to dig deeper and find some fundamental structures, that's great!

However, I'm a little worried about a future where we have 4 (or, over time, perhaps even more) modes of evaluation. The language specification tries very hard to reduce two modes of operation to one: It is a special property of constant expression evaluation that the results are globally canonicalized (that seems to imply that we have two modes). However, constant expression evaluation is explicitly specified to be no different from non-constant evaluation, it's just at the very end that the resulting object is checked against prior results of constant evaluation for a existing result which is "the same", in which case the older one is used ("sameness" is a bit tricky to define, but this is working today).

The canonicalization step is specified separately for specific expressions, including collection literals and constant object expressions (const MyClass(some, arguments)). Notably, there is no such step for string literals. Finally, records support canonicalization via a different mechanism known as 'structural equivalence'.

The fact that constant expressions can be evaluated at compile time, and run-time evaluation can just be "return this pre-computed result" is described as an option, not a requirement.

So we probably have to have two modes of evaluation, but they only need to differ as follows: (1) Normal expression evaluation is given (this is most of the language specification). Constant evaluation differs from normal evaluation by canonicalizing each expression evaluation step with certain syntactic forms (collection literals, constant object expressions, etc.). And (2) exceptions are handled differently, as described below.

This is not quite "just one mode", but it is as close as we can get. The point is that canonicalization is described as a different overall semantics, and everything else (of which we currently have just one thing, exception handling) is treated as "the normal semantics" respectively "the constant evaluation semantics" in the specification of each construct.

Hence, the dynamic semantics of throw expressions would need to be specified for normal evaluation and for constant evaluation separately. Then we won't need to say "if it would have thrown", anywhere, we just need a new primitive behavior which is only used during constant expression evaluation at compile time: "Take note that the following throw expression was evaluated: ...., and then terminate the current constant expression evaluation, reporting that error". With maybe some weasel words to say that there can be more than one error.

In addition to that, we probably need to mention that the runtime which is evaluating the constant expression at compile time will have the same behavior in case of runtime errors that aren't initiated by a throw expression evaluation (say, "divide by zero", "failed type cast", ..., and in the future maybe "out of memory" etc., depending on whether/how the constant expression sublanguage is generalized).

We probably have to specify that constant expression evaluation must be used for specific constant expressions (e.g., with e in const T x = e;, with the initializing expression of a final instance variable in a class with a constant constructor, with const [] no matter where it occurs, etc.), but it is optional for all other constant expressions (e.g., 2 + 2 in the context print(2 + 2)). This would include every expression which occurs in a constant context as well as every expression which does not occur in a constant context, but which is required to be a constant expression, as well as every expression whose first token is const. It would actually be nice to have that list, and make sure that it is complete.

This is still only two modes rather than 4, and they differ in only two very specific ways. I'd prefer if we can keep it minimal like that.

In a normal evaluation context, obvious constants should still be canonicalized

I guess this would apply when an expression isn't in a constant context, and otherwise isn't required to be constant, but the expression is a constant expression (e.g., "abc${"def".length / 2}" + "gh", occurring in a context which can't be a constant expression).

We could require such expressions to be subject to constant evaluation (hence canonicalization), but I'd prefer to hesitate a bit and make it optional. How much work would it take to be able to promise that we find all of those "fuzzy constants", and how helpful is it (for performance, for program correctness) to require that they are subject to constant evaluation? Is it a breaking change?

I proposed in https://github.com/dart-lang/language/issues/985 that we should clarify the rules about canonicalization of strings. The fact that an int expression could be e.length where e is a constant expression of type String illustrates that the decision would also include expressions of type int, and possibly several other types. So we basically need to push on the topic of that issue, with this generalization beyond strings.