dart-lang / language

Design of the Dart language
Other
2.65k stars 201 forks source link

Simple parts with imports #4082

Open eernstg opened 1 week ago

eernstg commented 1 week ago

I'm worried about the complexity of the scoping structure with the current proposal for enhanced parts (aka parts with imports).

Currently specified scoping structure

For example, assume the following Dart files (based on an example from @lrhn):

// --- Library 'main.dart'.
import 'l1.dart' show v1, v2, v3;
import 'q1.dart' as q show q1;
import 'p1a.dart' as p show p1, p2;

part 'part.dart';

void foo() {}
// --- Part file 'part.dart'.
part of 'main.dart';

import 'l2.dart' show v1;
import 'p1b.dart' as p show p1, p3;
import 'p1c.dart' as p show p4;

void bar() {}

const v3 = "a";

The imported libraries are not shown, but every name in their exported namespaces are shown using show clauses, so this is enough to know which names imported where.

graph BT;
  MainLibrary["'main.dart' top-level scope<br>foo, bar, v3"]
  MainPrefix["'main.dart' prefix scope<br>p, q"]
  MainImport["'main.dart' import scope<br>v1, v2, v3"]
  MainLibrary --> MainPrefix
  MainPrefix --> MainImport
  PartLibrary["'part.dart' top-level scope<br>foo, bar, v3"]
  PartPrefix["'part.dart' prefix scope<br>p"]
  PartImport["'part.dart' import scope<br>v1"]
  PartLibrary --> PartPrefix
  PartPrefix --> PartImport
  PartImport --> MainPrefix
  MainPrefixQ["'main.dart' prefix scope 'q'<br>q1"]
  MainPrefix -->|q| MainPrefixQ
  MainPrefixP["'main.dart' prefix scope 'p'<br>p1, p2"]
  MainPrefix -->|p| MainPrefixP
  PartPrefixP["'part.dart' prefix scope 'p'<br>p1, p3, p4"]
  PartPrefix -->|p| PartPrefixP
  PartPrefixP --> MainPrefixP

  style MainLibrary fill:lightgreen;
  style PartLibrary fill:lightgreen;
  style MainPrefixP fill:yellow;
  style MainPrefixQ fill:yellow;
  style PartPrefixP fill:yellow;
  style MainImport fill:lightblue;
  style PartImport fill:lightblue;

First, each file (library or part) gets all declarations in any of the files in the part tree that constitutes the library (here: just main and part). So the two files have the following declarations from the library itself: foo, bar, v3.

Moreover, main has access to p (going to the prefix scope) and it provides the names p.p1 and p.p2 in the "prefix scope p". Similarly for q.q1. Finally, v1, v2, and v3 are available because they are imported into main.

For part, we have a prefix scope p as well, but in this case it contains p.p1 (imported from p1b), p.p2 ("inherited" from the prefix scope p of main), p.p3 (p1b), and p.p4 (p1c).

A simpler version

I'd much prefer to use a simpler structure. I do not think it's reasonable to assume that developers will always and immediately have a full understanding of the graph structure of a given set of files that constitute a library, and even less the scoping structure which is considerably more complex than the file structure.

Perhaps we should aim to ensure that we follow the approach used with the top-level scope more consistently? The point is that "the same name has the same meaning everywhere in the library".

Then, we do not introduce the notion of a prefix scope, we maintain that prefixes are simply entries in the library scope. This means that it is an error to have a prefix whose name is the same as a top-level declaration like a function, but that's OK today so why wouldn't it be OK also with parts-with-imports?

This means that a library prefix will be in scope from all files of the library, no matter which imports in which files are using that name as its prefix. Also, the library prefixes will be populated identically everywhere.

Each prefix namespace would be populated by collecting the imported bindings from each of the import directives with that prefix that exist in the complete file tree. Any name clashes give rise to an ERROR entry in the name space (that is, there is no shadowing, we use the same approach to name clashes as we have always done in the pre-feature language).

For any given prefix p, if at least one import with prefix p is deferred, and two or more imports with that prefix exist in the library (any file) then an error occurs. Otherwise the deferred import has the usual pre-feature semantics (e.g., with loadLibrary).

The resulting scoping structure would be as follows:

graph BT;
  MainLibrary["'main.dart' top-level scope<br>foo, bar, v3, p, q"]
  Import["import scope<br>v1, v2, v3"]
  MainLibrary --> Import
  PartLibrary["'part.dart' top-level scope<br>foo, bar, v3, p, q"]
  PartLibrary --> Import
  PrefixQ["prefix scope 'q'<br>q1"]
  MainLibrary -->|q| PrefixQ
  PrefixP["prefix scope 'p'<br>p1, p2"]
  MainLibrary -->|p| PrefixP
  PrefixP["prefix scope 'p'<br>p1, p3, p4"]
  PartLibrary -->|p| PrefixP
  PartLibrary -->|q| PrefixQ

  style MainLibrary fill:lightgreen;
  style PartLibrary fill:lightgreen;
  style PrefixP fill:yellow;
  style PrefixQ fill:yellow;
  style Import fill:lightblue;

We could emit a warning whenever an imported name is used in a Dart file, and no import exists in the parent chain that provides the binding for this name. This would be applicable to both plain and prefixed imported names.

In other words, we would technically have a flat namespace of imported names for the entire library, prefixed or not, but we would use the warnings to ensure that the imports that we actually rely on are "visible".

If a part file, say, a macro generated part, should be protected against imports in user-written code then it should generate a part that imports every imported name which is used in the generated code. Each import would have a prefix which is a name that is fresh for the library as a whole, and each reference to an imported name would be prefixed with that fresh name. Should be possible. ;-)

I know this is very different from the approaches that we've considered for quite a while, but I do think there is a need to simplify the specified structures as of today.

@dart-lang/language-team, WDYT?

lrhn commented 1 week ago

It is already specified that a prefix cannot have the same name as a library member declaration. That's mainly because that name is inaccessible because of shadowing.

Having all prefixes in scope in all files of the library is a possible choice. I don't actually think it's simpler to explain than the rule "your imports, prefixed or not, takes precedence over those inherited from the parent file".

The idea of having imports in part files at all is to allow having imports that only matter to that part file (such-tree). Having those import prefix names be global does not bring any benefit, just more risk of annoying conflicts.

natebosch commented 1 week ago

Each import would have a prefix which is a name that is fresh for the library as a whole

What do you mean by this? Is this automatic, or would we be requiring codegen authors to satisfy this guarantee on their own? I don't think that would be a reasonable requirement if so.

lrhn commented 1 week ago

I'm pretty sure we're going to help macro code-generators produce fresh names, if nothing else then for private library or class member declarations, so they won't conflict with members introduced by other macros.

I'd still be annoying to have to do that for import prefixes too, especially to avoid conflcits with ones that are not even in scope.

In general, I disagree with the characterization of the import rules as complex and needing to be simplified.

The user story for them is actually exceedingly simple.

Resolution of a name in the top-level scope is as simple as:

If a file has no imports, it has access to precisely the same imported names and import prefixes as its parent file. Doing nothing changes nothing!

The part file still has access to any name in the parent file's imports that it hasn't shadowed with its own imports or prefix names. Any name that it hasn't asked to change the meaning of. The only changes to the parent file's import scope are the changes defined in this file. Very localized!

An author doesn't need to understand the entire path up to the library file, they just need to understand what's available in the parent file. That makes sense, the reason to create a part file is most likely that the parent file was growing too big, or it needed too many imports, so a part of it is put into a part file and maybe given some imports that only it needs. The understanding of that part file is relative to the understanding of the parent file.

I do not think it's reasonable to assume that developers will always and immediately have a full understanding of the graph structure of a given set of files that constitute a library, and even less the scoping structure which is considerably more complex than the file structure.

They don't need full understanding of the graph structure. They just need to understand the parent file. The scoping is not more complicated than the graph structure. It's incremental, built on top of the parent file, and in a way where every change is asked for, and every change can be made. No parent file import name can block a part file from importing or using the name as a prefix. That would require you to know the full graph structure, because you can't choose to ignore it.

(The global library member declaration scope is really the thing that stands out as an exception. It's also a thing that would be breaking to change. If we dared, I'd say that a library member declaration is only available in the library declaration scope of a file that is above or below a declaration of the member. Can't see what siblings are doing. But will still get error if we both try to export the same name. Probably too hard to change.)

eernstg commented 1 week ago

@natebosch wrote:

I don't think that would be a reasonable requirement if so.

One more comment on this: A similar requirement already exists (in any scoping model under consideration for this feature) for every non-augmenting top-level declaration which is provided by code generation (macro or not): The name of that top-level declaration must be fresh in exactly the same sense as what I'm suggesting for the prefix. It is probably not harder to create a fresh name for a prefix than for a top-level declaration.

@lrhn wrote:

In general, I disagree with the characterization of the import rules as complex and needing to be simplified.

The user story for them is actually exceedingly simple.

Resolution of a name in the top-level scope is as simple as:

  • Declared in this library? If so, that's what it means.
  • Imported or declared as a prefix in this file? If so, that's what it means.
  • Otherwise, whatever it would mean in the parent file.

That sounds simple, but we need to add a bit of detail:

First, 'imported or declared as a prefix in this file' skips over the fact that a prefix will shadow an imported declaration with the same name. So it's better to insist that there are two nested scopes here.

Next, we don't start with the solution to the name binding task, we start with a name and the question "What does this name mean?". If you have an IDE and you can Control-click or something like that in order to go to the declaration, fine. However, in some situations there is no IDE, and even when there is an IDE that could do this, we need to read a lot of identifiers and we can't perform a lookup on every single one of them all the time. In other words, we do need to have an intuition about how to find the "meaning" of an identifier in terms of "which declaration is it referring to".

Assume that we have a name which is not resolved in a nested scope in a file. We then know that it must be resolved in the library scope, or one of the enclosing ones, and then we search all files that constitute this library. If we found a declaration of that name then that's the one.

Otherwise we search the prefix scope of the current file (note: 'all files' vs. 'the current file'). If we find it there then that's it.

Otherwise we search the import scope of the current file (not all files). If we find it there then that's it.

I think it is true that we can then repeat the same steps for the parent file. This means that we will spuriously search the library scope one more time, but we are guaranteed that we won't find anything; note again that this spurious search targets all files, not just the parent. When we search the prefix scope in the parent we may find a binding, but we are again guaranteed that there is no such binding in the prefix scope that we searched previously, so we don't need to worry about the fact that the same prefix can have different contents in the original file and in the parent.

It is crucial that a developer who is trying to understand what a name means has the global nature of the first search in mind, and the local nature of the subsequent searches (if you insist on saying "repeat the same search in the parent" then it's just some of the subsequent searches that are local). For example, it certainly doesn't work to think that we must search the current file (top-level declarations plus import prefixes), then the imported namespace, and then we repeat the same steps in the parent.

Also, if we do remember that the search in every file starts with a global lookup for top-level declarations then it might well be a source of confusion that the explanation goes like "search top-level declarations globally; then search some other namespaces; then search top-level declarations globally one more time; then search some more of those other namespaces; etc.".

It might work, but it is not completely obvious to me why we'd do it (or explain it) like that.

The only changes to the parent file's import scope are the changes defined in this file. Very localized!

Also inconsistent because the first search is global to the entire file tree that constitutes the library.

For instance, you can use an import to change the meaning of foo, but you can not add a top-level declaration in this file to change the meaning of foo, because that will now be the meaning of foo in all files of the library, which is probably a breaking change for all references to foo in other files of this library.

They don't need full understanding of the graph structure. They just need to understand the parent file.

They do need to understand that if we import foo then it shadows any declaration of foo which is available to the parent because it's imported, but it's the other way around if it is available to the parent because it's declared in the parent (or in any other file of this library). So it's crucial to keep track of the distinction between imported names and names declared in some file of this library, because shadowing isn't just shadowing, it's secondary to the distinction between top-level declarations and imported declarations. It certainly doesn't suffice to notice that foo can be used in the parent file.

No parent file import name can block a part file from importing or using the name as a prefix.

True. I'm just not convinced that this is a feature, it looks a lot like a bug.

The global library member declaration scope is really the thing that stands out as an exception. It's also a thing that would be breaking to change.

That's exactly the reason why I'm looking for a model which is consistent with the global nature of the top-level scope, rather than trying to arrange everything else according to a "parent is an enclosing scope" kind of thinking.


My proposal uses the global approach for the top-level scope, and for import prefixes (and then there is no need to separate top-level declarations of import prefixes from other top-level declarations), and it uses a shared imported namespace for all the files of the library.

The main sources of simplicity are: (1) It's a very simple graph structure, just like a library with no parts, (2) each name which is in scope at the top level has the same meaning in every file of the library.

The main source of complexity is that all namespaces are global to the library, that is, a specific name can be used everywhere even though it could be declared in any of the files of the library. This may be considered to be "ugly", but I do think it is important that it is consistently true for all namespaces. We always have to think about the entire library when we want to understand the meaning of a name which is in scope at top level, and this is true for declarations in the library as well as imported declarations.

By the way, note that the semantics of export directives is also global.

I do understand that this whole line of thinking is at odds with the traditional emphasis on supporting a modular separation of the parts that make up a complex software entity. Encapsulation, modularity, information hiding, implementation independence, the works, those words are usually considered to be unquestionably positive.

However, in the particular case where we discuss augmentations and part files, I think the approach that actually works (given that we aren't going to use any other approach than the global one for the top-level declarations) is the global one. We are working on a specific library. It does consist of different files, but they are interacting in such global ways that we need to understand the library as a unit. We can't do anything in terms of developing a part if we ignore the other files in the library. So let's try to explore an approach where we recognize the tight coupling of the files of a library, and use a simple, global scoping structure.

lrhn commented 1 week ago

The current design has a goal of allowing part files to control the meaning of any name in its top-level scope that is not a declaration of the current library. And without needing to know which other names are imported elsewhere. If we drop that goal, we can do other things, but I actually think it's a useful property to have, especially for macro generated code. It can choose to make an import of a name, and then it can actually trust that that imported name is available.

If we make all imports be global too, so that it makes no difference which library they are declared in (like exports), and we have just one top-level scope for all files, then macros can still achieve the same control, they just need to use prefixed imports, using fresh prefix names. Definitely possible. Possibly easier to understand, but at the cost of extra verbosity. I'm just not sure that the current scoping is so hard to understand, and I think the extra flexibility is worth it.

eernstg commented 1 week ago

The current design has a goal of allowing part files to control the meaning of any name in its top-level scope that is not a declaration of the current library.

Right, I'm just saying that this property isn't necessarily desirable, because of the glaring inconsistency in the treatment of the top-level scope of the library and everything else.

I'm not claiming that the consistently flat approach is beautiful, just that it is simpler and more consistent.