dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.22k stars 1.57k forks source link

[sdk] class called "Function" does not conform to the spec. #52594

Closed modulovalue closed 6 months ago

modulovalue commented 1 year ago

The Dart SDK introduces new syntax in its core library by shipping a class called "Function". "Function" is not a valid type identifier and so, according to the specification, a class can't be named "Function". The conformance section in the spec explicitly disallows new syntax.

Furthermore, the analyzer parses classes with a "Function" name without syntax errors, while the ANTLR-based spec, as expected, emits an error for the following program:

class Function {}
 === Analyzer ===
Scan errors: 0
Parse errors: 0
<CompilationUnitImpl> [0-17]
┗━ <ClassDeclarationImpl> [0-17]
  ┣━ 'class' [0-5]
  ┣━ 'Function' [6-14]
  ┣━ '{' [15-16]
  ┗━ '}' [16-17]
--------------------------------------------------------------------------------
 === ANTLR ===
Errors of type 1: [[@1,6:13='Function',<94>,1:6], [@1,6:13='Function',<94>,1:6], [@2,15:15='{',<139>,1:15], [@2,15:15='{',<139>,1:15]]
Errors of type 2: []
<startSymbol>
┗━ <libraryDefinition>
  ┣━ <metadata>
  ┣━ <topLevelDefinition>
  ┃  ┗━ <classDeclaration>
  ┣━ <metadata>
  ┣━ <topLevelDefinition>
  ┃  ┗━ <classDeclaration>
  ┃    ┗━ 'class'
  ┣━ <metadata>
  ┣━ <topLevelDefinition>
  ┣━ <metadata>
  ┣━ <topLevelDefinition>
  ┃  ┣━ 'Function'
  ┃  ┣━ '{'
  ┃  ┗━ '}'
  ┗━ '<EOF>'

It seems to me that this "hack" exists for UX-related reasons (to e.g. allow the "Go to implementation" feature in IDEs to resolve to an actual "thing" instead of making "Function" act like a keyword-like construct).

void main() {
//   ,--- "go to implementation" redirects to a real class with useful comments.
//   v
  Function fn;
}

I think that this is helpful, but I also think that conforming to the spec without such "hacks" would also be good.

(This "hack" seems intentional and there's probably nothing surprising here. I just couldn't find an issue that captures this exact observation.)

Cf. https://github.com/dart-lang/sdk/issues/45703 which seems to have dealt with disallowing users from defining custom classes called "Function" (but the SDK still ships new syntax that doesn't conform to the spec).

eernstg commented 1 year ago

I'd say that this is all specified and working as intended.

Checking the programs in https://github.com/dart-lang/sdk/issues/45703, every attempt made in those programs to declare a type named Function is detected and rejected as a compile-time error, by the analyzer as well as the CFE. This is consistent with the fact that Function is specified to be a built-in identifier.

It is not a problem that there is a declaration in a platform library that introduces a type named Function. Platform libraries can break arbitrary rules, in the sense that (1) they are not the specification, they just need to implement it, and (2) tools can give any part of the code in such libraries a special treatment. For instance, they can choose to accept one specific class declaration named Function, because that's needed in order to be able to compile code like Function f = print; correctly in user-written code.

Platform libraries break rules in many other cases. For instance, the number classes num/int/double may have a variety of subtypes, depending on the platform, even though that is specifically an error according to the specification; int has an external const factory int.fromEnvironment constructor (external const factory is not a thing, otherwise); FutureOr has very, very special subtyping properties that you can't specify for any class written in Dart as specified; and so on and so on.

Furthermore, the analyzer parses classes with a "Function" name without syntax errors

This is also not a problem: Any tool can re-categorize a syntax error to be a compile-time error that isn't a syntax error, which might be done in practice because that turns out to be a useful implementation. This typically occurs because it is easier to get good error recovery from a parser which is broader than the specified grammar indicates (so we're reporting fewer things as syntax errors, thus accepting more programs), as long as the texts (that is, wannabe-programs) that are syntax errors according to the grammar will then be rejected with some other compile-time error.

So it's OK for the analyzer parser to parse class Function {} and report a (non-syntax) compile-time error stating that the name of the class cannot be Function. That's also what I see in dart-pad as of today:

The built-in identifier 'Function' can't be used as a type name.
modulovalue commented 1 year ago

Note: The issue description contains an incorrect link to the Function interface. The link refers to a Function "patch" the actual interface that users are exposed to is here function.dart.


For instance, the number classes num/int/double may have a variety of subtypes, depending on the platform, even though that is specifically an error according to the specification;

The SDK appears to try very hard to hide this fact from users via its "patch" mechanism. The Function declaration is not hidden from users in any way. If they "go to implementation" they are exposed to a declaration that is not valid Dart syntax and according to the spec, extending syntax is not allowed.

Do you consider the declarations (including any interfaces) in dart:core (before any patches have been applied) to be platform libraries and so they can deviate from the spec?


int has an external const factory int.fromEnvironment constructor (external const factory is not a thing

The grammar snippets in the spec contain the following productions:

⟨declaration⟩ ::= external ⟨factoryConstructorSignature⟩ ⟨factoryConstructorSignature⟩ ::= const? factory ⟨constructorName⟩ ⟨formalParameterList⟩

and they can be used to derive "external const factory ..."

Furthermore, the spec says:

An external function may be a top-level function (19), a method (10.2, 10.8), a getter (10.3), a setter (10.4), or a non-redirecting constructor (10.7.1, 10.7.2).

and external const factory int.fromEnvironment(String name, {int defaultValue = 0}); seems to be allowed because it is a non-redirecting constructor (10.7.2 Factories).

In what way is external const factory not a thing?


FutureOr has very, very special subtyping properties that you can't specify for any class written in Dart as specified;

FutureOr is mentioned explicitly by the deprecated subtyping specification (and the spec). It doesn't look like the specification says anything about FutureOr that the implementation disagrees with.


I've also noticed that Section 3, Normative References refers to the following link: https://api.dartlang.org/ which contains https://api.dart.dev/stable/3.0.3/dart-core/Function-class.html. So the spec depends on an API reference that contains a class that introduces new syntax due to implementation specific behavior? That doesn't seem to intentional.


This is also not a problem: Any tool can re-categorize a syntax error to be a compile-time error,

Thank you, I agree. It would be nice if the analyzer could report issues like that as syntax errors, but of course, it doesn't have to.

eernstg commented 1 year ago

Do you consider the declarations (including any interfaces) in dart:core (before any patches have been applied) to be platform libraries and so they can deviate from the spec?

Certainly, yes. That's also consistent with the use of the term 'system library' that the language specification uses (we have used 'platform library' in many discussions, but the correct term according to the spec is 'system library', so I should have said 'system library' whenever I said 'platform library'). In any case, a system library does not exist in order to demonstrate spec compliant code, it exists in order to support the implementation of a set of Dart processing tools.

Dart the language has a number of atomic elements (entities that cannot be expressed in Dart and hence are pre-defined), and they need to be provided by an implementation of Dart. If that's convenient to do by having a declaration like class Function {...} in some system library that would have been rejected as a compile-time error, except that the analyzer/compiler/other_tools "know" this declaration and give it exceptional treatment, then so be it.

It is simply not useful to insist that system libraries cannot contain code which is given exceptional treatment by the tools (in particular, the code can be a compile-time error according to the specification, or the code could behave in a completely different way according to the specification). The reason why such code exists is exactly that some elements of Dart cannot be written in Dart (as is true for every language except the untyped lambda calculus).

[we can] derive "external const factory ..."

That is true, but you can also derive covariant int i as a <normalFormalParameter> of a top-level function, and that is still an error. This means that it doesn't say much that the syntax can be derived.

The external const factory is actually an interesting example, because the mechanism whereby an external declaration is bound to an actual foreign function is implementation specific. (Haha, caught myself on the wrong foot there! ;-)

However, the support for the fromEnvironment and hasEnvironment constructors exists on all configurations (as far as I know), which makes them similar to a mechanism which is part of the language itself.

That said, I doubt that any tool will support external const factory for anything other than those very special cases, because there's a semantic clash between binding a constructor to an unknown piece of foreign code, and at the same time promising that the returned result will be constant.

It doesn't look like the specification says anything about FutureOr that the implementation disagrees with.

The syntax that introduces the identifier FutureOr into the scope of 'dart:async' is a class declaration. The specification specifies all subtyping relationships of every possible class declaration in Dart that isn't an error. But the subtyping properties of FutureOr<T> for any given T are completely different than the subtyping properties of a FutureOr class that you create by copying the source code from 'dart:async', character for character.

This is so because the analyzer/compiler/... are treating the FutureOr declaration in 'dart:async' in a way which does not conform to the specification of that piece of syntax.

So we're violating the rules of Dart when compiling/analyzing 'dart:async' just because that declaration of FutureOr as a class, plus any number of special-casing parts of the tools, are a useful way to ensure that FutureOr is in scope in a Dart library that imports 'dart:async', and that it behaves as specified. In particular, anyone who uses FutureOr in their code does not need to know that this feature relies on having a class declaration which is "compiled and analyzed incorrectly".

eernstg commented 1 year ago

So the spec depends on an API reference that contains a class that introduces new syntax due to implementation specific behavior?

The spec refers to the documentation of the 'built-in class Function'. An implementation is free to ignore the source code of 'dart:core' and implement all of Dart manually (more work, but possible). So in that sense there is no necessary connection between (1) the spec and this reference to the documentation of Function and (2) the actual source code that breaks the rules and declares a class named Function.

In short, you can read the actual source code in system libraries, but it doesn't tell you anything because any tool could presumably give any part of it an exceptional treatment, as long as they implement Dart in a conforming way.

modulovalue commented 1 year ago

Thank you for the clarification and taking the time to provide a detailed explanation.

I agree with your reasoning here and this seems to be working as intended, so I'm going to close this issue.

modulovalue commented 7 months ago

Another symptom of this hack can be found in the following program:

@Function.id()library;

Which is parsed by the analyzer without any errors, but rejected by DSP as a syntax error.

 === pkg:analyzer (https://pub.dev/packages/analyzer) ===
Parsing succeeded with no errors.
Scan errors: 0
Parse errors: 0
<CompilationUnitImpl> [0-22]
┗━ <LibraryDirectiveImpl> [0-22]
  ┣━ <AnnotationImpl> [0-14]
  ┃  ┣━ '@' [0-1]
  ┃  ┣━ <PrefixedIdentifierImpl> [1-12]
  ┃  ┃  ┣━ <SimpleIdentifierImpl> [1-9]
  ┃  ┃  ┃  ┗━ 'Function' [1-9]
  ┃  ┃  ┣━ '.' [9-10]
  ┃  ┃  ┗━ <SimpleIdentifierImpl> [10-12]
  ┃  ┃    ┗━ 'id' [10-12]
  ┃  ┗━ <ArgumentListImpl> [12-14]
  ┃    ┣━ '(' [12-13]
  ┃    ┗━ ')' [13-14]
  ┣━ 'library' [14-21]
  ┗━ ';' [21-22]
--------------------------------------------------------------------------------
 === DSP (https://github.com/dart-lang/sdk/blob/master/tools/spec_parser/dart_spec_parser/Dart.g4) dspVersion v0.41 ===
Parsing failed with some errors:
Errors of type 1: [[@2,9:9='.',<11>,1:9] Bad state: ]
Errors of type 2: []
<startSymbol>
┣━ '@'
┣━ 'Function'
┣━ '.'
┣━ 'id'
┣━ '('
┣━ ')'
┣━ 'library'
┗━ ';'

@Function.id()library; is a syntax error in the DSP, but not a syntax error in the analyzer as it rejects programs like that in a later phase.

I think it would be nice to try to keep the analyzer and DSP in-sync with respect to syntax errors to make it easier to maintain consistency across different artifacts.

eernstg commented 7 months ago

PrefixedIdentifier and its ...Impl do not correspond to a non-terminal in the grammar of any Dart language specification document (or any non-terminal in Dart.g for that matter), it is a special casing construct that the analyzer has used for many years. So we'd need to consider the specified grammar, find a suitable AST for a given term, and then we can understand how PrefixedIdentifier represents a part of the specified syntactic structure.

In general, it's not a problem that a tool uses an approach to parsing which is somewhat different from the specified grammar, as long as it accepts the same programs. For example, it may be useful to parse some constructs differently because it allows for better recovery or more informative error messages. This means, for example, that the analyzer and other tools can parse a syntactically bigger language and then report the failure as a non-syntax error in a later phase, even though it is specified to be a syntax error.

A term like @Function.id() derived from <metadata> should have the structure '@' <constructorDesignation> <arguments> (no other alternative can succeed), and <constructorDesignation> must derive <qualifiedName> which must be <typeIdentifier> '.' <identifierOrNew>.

However, Function is not a <typeIdentifier>, so we can't derive the given term from <metadata>.

The analyzer makes the choice to parse a language which is a bit bigger than the specified one, and report a compile-time error later on for @Function.id(), and that's fine.

It's a matter of handling opposing forces: The language specification specifies a grammar whose rules are intended to be concise and comprehensible, and the tools implement a parser plus some non-syntax errors to accept the specified language precisely, while allowing for good feedback to developers. In this balancing act it is not considered a problem that some errors are syntax errors according to the specification, but they are reported as non-syntax errors by a tool.

Consistency is good, but I'm not convinced that it would be a better trade-off to have a much more verbose specification (because it specifies a bunch of "syntax" errors in natural language along with the regular grammar rules), nor that it would be a better trade-off to insist that the tools cannot use this technique of "using a more permissive grammar, then filtering out 'syntax' errors using non-syntax error messages".

modulovalue commented 7 months ago

@eernstg First of all, my goal is to prevent us from having to observe accidents like the following:

I would like to see the language team be the ultimate authority on what is valid Dart syntax because they should know best how to prevent Dart from having undesirable properties.


I'm currently observing the following:

Libraries that ship with the SDK are expected to adhere to system-dart (which has no specification). User code is expected to adhere to user-dart (which is specified by the language team).

Erik, I want to be clear that I do not think that the spec should start dealing with error recovery. Consider the following example which is a program where no error recovery is needed, e.g.:

void main() {
  Function.apply(() => 0, []);
}

Any implementation (or IDE, or static analyzer, or any other tool that attempts to resolve Dart programs) needs to be able to parse system-dart and user-dart because they need the system libraries to know that Function has a static function called apply.

In practice, any desirable language properties that the language team is able to guarantee for user-dart are thrown out of the window (from the perspective of a tool author) if system-dart is able to do whatever it wants to.

So I guess what I'd like to see is have system-dart be == user-dart.

Are there perhaps any issues that I'm not anticipating with having system libraries be required not to extend the syntax specified by the spec?

eernstg commented 7 months ago

my goal is to prevent [that Dart syntax is ill defined, e.g., that parsing is undecidable]

I'd very much like to avoid that, too.

However, we have a history, and the desire to obtain a syntactic universe that is smoothly expressed as the language derived by a context free grammar (or any other simple and beautiful mathematical object) must be balanced against the need to include earlier versions of the same language (that is, the need to avoid breaking changes). Another force that matters is the desire to have specific forms for specific constructs, e.g., using < and > as a kind of parentheses, in spite of the fact that they are used as binary operators as well, and that's a crazy combination. ;-)

So we're not going to have the luxury of defining anything (including the grammar) from scratch, and we'll have to live with a certain amount of exceptions to the beautiful and simple rules that we would otherwise be able to have.

That's basically the reason why Dart as a language has a number of rules that seem to be inconsistent or less mathematically beautiful than we'd prefer.

Next, I don't think it's useful to think about Dart as a multi-version language (system-dart plus user-dart, or anything like that).

In particular, for a declaration like abstract final class Function {...} in function.dart, it is not reasonable to say that it is written in a different language than Dart (e.g., system-dart).

This declaration has very special properties (e.g., every function type is a subtype of Function), and there is absolutely no reason to insist that the special powers of this declaration are provided by a different language called system-dart. Instead, Dart language processing tools are written to give this particular declaration a very special treatment. This treatment doesn't generalize to a language feature, it's just a special treatment of this particular declaration. For instance, we don't want any other declaration to have the same special property about subtyping as Function. There is no general language mechanism here, just a special exception for one particular declaration.

Another special exception for the declaration of Function is that it is a class declaration whose processing has the exceptional permission to have a name which is a built-in identifier. Again, this is a special exception just for that declaration, and all the tools "know" about it such that they can accept the declaration. Again, we don't want this to be a general language mechanism, we just want individual exceptions for specific declarations in the SDK.

Rather than saying that the system libraries are written in a different language, we should say that the system libraries are implemented in some way that may or may not involve a certain amount of Dart source code, but that Dart source code, if any, may be given a non-standard semantics anywhere and in any way that suits the implementors because the tools and the core libraries are delivered as a whole (so that the compiler etc. can "know" about the special properties of Function etc).

(Alternatively, everything which isn't expressible in Dart source code could be written in some other language, e.g., assembly.)

This means that you can almost read the Dart source code in the SDK as Dart libraries, but the tools are free to make exceptions for anything anywhere, as long as this provides a system that makes user libraries written in Dart behave as specified.

I think this situation is unavoidable for a practical language: They all need primitive features (things that we can't write in the language itself, but which are provided by "system libraries" of some sort). and they probably all want to use the target language to express the standard libraries because almost everything has the standard semantics. It's simply convenient to write the Dart SDK in Dart (with exceptions).

I'd recommend that you just stop worrying about exceptions in the system libraries. They are there, they are needed, but they don't generalize meaningfully to a separate language.

modulovalue commented 7 months ago

I may have been a little ambiguous about the perspective that I am looking at this from. @eernstg You have rightfully assumed a broader view than I intended. My statements were only meant to comment on the fact that system libraries are free to make syntactic exceptions. I don't worry about any differences or exceptions related to, e.g., resolution, type inference or type checking.

It seems to me that what I'm proposing, a unified grammar for user libraries and system libraries, is not something that is incompatible with the constraints that you've provided, or any other constraint that I can think of.

I can't quite pinpoint the exact cause of our disagreement here, so how about a more concrete proposal for the changes that I would like to see:

  1. To allow system libraries to ship classes named Function:
classDeclaration
    :    (classModifiers | mixinClassModifiers)
//                                   new:
//                                   vvvvvvvv
         CLASS (typeWithParameters | FUNCTION) superclass? interfaces?
         LBRACE (metadata classMemberDeclaration)* RBRACE
    |    classModifiers CLASS mixinApplicationClass
    ;
  1. To prevent user libraries from declaring classes named Function.
It is a compile-time error for non system libraries to declare a classDeclaration named 'Function'.

Commentary:
The support for classes named 'Function' is there to allow implementations to deliver a class named 'Function' that can be used to, for example, support a go-to-declaration feature on Function types.
  1. To ensure that no library is able to add custom syntax.
Dart libraries may not support syntax that is not specified by the specification.

Commentary:
This restriction is meant to prevent changes to Dart's syntax that could accidentally change the property-profile of Dart's syntactical language in detrimental ways.

Should the need for custom syntax in system libraries come up, then that syntax would have to become part of the spec, and be reviewed by the language team.

What do you think?

lrhn commented 7 months ago

While we try to make platform libraries be written in Dart, nothing requires them to be.

It's just convenient, because we have way to execute Dart code already, and a way to gather documentation from Dart code, and most of the time the source code is a good representation of what the library feature actually does.

There are exceptions. Not many, but not zero.

Those are the major places where source declaration does not (necessarily) match what is actually being exported. The dynamic and Never types are so far from being class types that we never tried to have a placeholder declaration, but we should find a way to export documentation for them anyway. They are names exported by dart:core. FutureOr probably shouldn't be a class either.

lrhn commented 7 months ago

About:

Dart libraries may not support syntax that is not specified by the specification.

Saying that is useless.

If a file contains something that is not allowed by the Dart specification, it is by definition not a Dart file. So it's not a Dart library, so it's not bound by that restriction. Catch 22!

A language specification defines a set of valid Dart programs. Any input which causes a parsing error or any static error specified by the language specification is not a valid Dart program. (Or if it fails in an undescribed way, like a file just not existing, which I don't think we actually call out, because we don't say how to go from URI to source code), Then the language specification defines a runtime semantics for valid Dart programs.

Anything outside of that is just not Dart, and therefore not something the Dart specification has any mandate over.

eernstg commented 7 months ago

@lrhn, I think the topic here is slightly different: The question is whether or not it's OK to have a declaration like

class Function ... {...}

in the SDK, given that Function is a built-in identifier and hence it cannot be the name of a type declaration (such as a class declaration). In any user-written library this declaration would simply be a compile-time error, so why is it OK to have it in a system library? Similarly, how does the class Object manage to avoid having a superclass?

My response is that there is no obligation for a system library to satisfy all the rules of Dart. In fact, every tool in the tool chain associated with this SDK can give an exceptional (that is, rule breaking) treatment for any single declaration in the SDK, if needed. In fact, nobody promises us that there even is any Dart source code that is used to provide the services expected from the system by any Dart library. For example, nobody promises us that Function or Object even have a declaration, they might as well have been established outside the Dart language (just like the runtime is written in C++, but Dart programs interact with the runtime all the time).

The implication is that no tools can immediately be expected to work on Dart system libraries, just because they are working on all non-system Dart libraries: they may need to give a bunch of system library declarations special treatment. That special treatment isn't specified in any language specification documents, because they are concerned with the meaning of non-system libraries. However, the language specification documents do specify how entities like the classes Object and Function behave semantically, and how they can be used from non-system libraries, they just don't specify how the particular source code text that declares Object and Function manage to have that meaning.

@modulovalue, if you think I've misrepresented the discussion then please add in your perspective as needed!

eernstg commented 6 months ago

I'll close this issue: We do not plan to require every library in the SDK to conform to the rules of the Dart language. Those libraries contain specific declarations whose semantics cannot be written in Dart, and hence they must at times violate the rules that every user-written library must satisfy.

The fact that the class Function is a supertype of all function types is an example, there is no syntax which will make this happen. So the analyzer and compiler just have to know that the declaration of the class Function is special, and give it special treatment.

Similarly, the declaration of the class Function can have the name Function even though that is a built-in identifier. The tools will simply give that declaration a special treatment, and that is not a problem.

This does not mean that the rules of the language are fuzzy in any user-written libraries, they must still follow all the rules.