Questions and comments about the modules proposal

dart-lang / language

Design of the Dart language

Other

2.66k stars 204 forks source link

Questions and comments about the modules proposal #1749

Open bwilkerson opened 3 years ago

bwilkerson commented 3 years ago

For the most part this looks good. If you'd prefer I can break this up into multiple issues.

Modules ... The first two rules mean that there is a strict nesting: packages contain modules which contain libraries (which may contain parts).

It sounds, from the rest of the doc, like modules will largely (hopefully completely) negate the need for part files. As a result, I wonder whether we might consider removing part files in some future version of Dart, and if so whether we might want to discourage their use now by doing something like making it invalid for a module to contain more than one library if any of those libraries have parts. That's probably too strong a restriction, but you get the idea.

Also, what will it mean if a user opts a given library out of module support? Would that library still be in an implicitly named module? Would other libraries be able to be in the same module or be friends of the module?

Module names ... Is this the right rule? What about dots in file names?

Dots in file and directory names can result in unintentionally having two libraries in the same module. Consider /lib/a.b/c.dart and /lib/a/b.c.dart. It probably isn't likely to happen, and I can't think of any solution that can't be subverted, but we might be able to reduce the probability by replacing the dots in the names with another character or by removing them. It isn't clear that it's worth the effort.

Module boundaries

It isn't explicit, but I assume that it's possible to explicitly use an implicit module name. That is, define a library A but don't specify a module name, then define a library B and explicitly put it in the module implicitly defined for A. That seems error prone and might be something we want to discourage if not disallow outright.

One way to disallow it is for libraries that don't explicitly specify a module name to be given a unique but opaque module name (that isn't necessarily expressible as a dotted name). No other libraries could be added to the module, but the names would never conflict.

Friend modules

I suspect that I'm not fully understanding this part of the proposal.

This appears to assume that libraries outside lib can't be part of the same module as those inside lib. While that makes sense, we might want to make that restriction explicit. For example, what about libraries defined in bin?

Private identifiers in friend modules are visible to all libraries in both modules. Friendship is transitive. If module A is a friend of B which is a friend of C, then A, B, and C, all have access to each other's private identifiers.

I'm guessing that an identifier from a friend module is only in scope if the library that declares the identifier is imported. If that's not the case (and maybe even if it is), consider making the requirement more explicit.

But if that's true, then it raises a question. If a library La in A imports a library Lb in B, then Lb can't import La because that would violate the rule that modules can't have cycles. So what do you mean by "have access to each other's private identifiers"?

It is a compile-time error if multiple libraries in the same module have friend clauses.

That has some implications for compilers. They're basically required to figure out the module structure before they can finish generating diagnostics for any given library, even if that library is the only library in a dependency cycle.

@scheglov In case you didn't see this or the next item.

A module can only friend one other module. (But a module can be a friend of multiple other modules. Friendship forms a tree where all modules in the tree share the same private names.)

Aside from the compiler implications of needing to identify the friendship tree before resolving any of the code in any of the libraries in the tree, I'm a bit concerned about the usability implications here (assuming I understand this). If module B is a friend of A, and module C is a friend of A, then it would be possible for code in B to access private names in C, but I think that could be very confusing to users when there's no explicit reference from B to C in either direction.

scheglov commented 3 years ago

I don't buy the argument that modules simplify dependencies management for compilers, at least for the analyzer. We already build the graph of libraries, and already link element models of library cycles. And we already track which library cycles referenced by which, and discard element models for transitively affected library cycles (for Cider only currently). If things will go well, we might get symbol level dependencies. Module level is quite coarse-grained.

Why do we disallow multiple friends? One use case for multiple friends could be to have a single test library that is a friend with two implementation modules, and checks internals of both. Not strongly necessary - we could have two test libraries. Or a library that is a friend with two other modules, and move values from one private field to another. Also not quite necessary - we could have two separate modules, each is a friend of a single module and exposes the private field as public, but as I understand it, one of the reasons for modules is not allow easier access.

I'm not concerned that we need to build modules before generating diagnostics. We will need to do this much earlier anyway - before linking element models, because we can access private elements that are declared types of fields or methods, or are initializers of field using for inference.

Which reminds me, probably modules don't change anything here, but it is possible to declare a return type of a method to be a private class.

I'm also not concerned about determining friends - it is all based on unresolved AST, just like imports and exports, and is cheap to compute and cache.

I have a question about friends design. IIRC in C++ a class declares who is its friend, and so is allowed to access internals. So, it is not transitive. And with friendship in Dart restricted to packages, and not exposed in anyway semantically, it seems to be an internal implementation detail of each package. So, the question is why do we want to make it transitive? Especially, as @bwilkerson noticed, earlier version of the proposal requires modules to form DAG.

Levi-Lesches commented 3 years ago

I'm guessing that an identifier from a friend module is only in scope if the library that declares the identifier is imported.

If module B is a friend of A, and module C is a friend of A, then it would be possible for code in B to access private names in C, but I think that could be very confusing to users when there's no explicit reference from B to C in either direction.

I think that there should 100% be an "explicit reference" -- an import statement. Putting the ideas of packages, modules, and libraries aside for a second, it would be very confusing if you could create a new file that magically has access to a bunch of files, but not others, and you'd have to look at some other file to see what the restrictions even are. import, and even part, make these relationships explicit and clear.

munificent commented 3 years ago

It sounds, from the rest of the doc, like modules will largely (hopefully completely) negate the need for part files. As a result, I wonder whether we might consider removing part files in some future version of Dart, and if so whether we might want to discourage their use now by doing something like making it invalid for a module to contain more than one library if any of those libraries have parts. That's probably too strong a restriction, but you get the idea.

Yes, if possible, I would love to remove support for part files. I worry a lot about the cognitive overhead of modules and subtracting some complexity from the language would be a good way to pay for that. I like the idea of saying that libraries in modules can't use part files. I have to think more about how that would interact with migration.

Also, what will it mean if a user opts a given library out of module support? Would that library still be in an implicitly named module? Would other libraries be able to be in the same module or be friends of the module?

Good question. That's all a big TODO right now. :)

Dots in file and directory names can result in unintentionally having two libraries in the same module. Consider /lib/a.b/c.dart and /lib/a/b.c.dart. It probably isn't likely to happen, and I can't think of any solution that can't be subverted, but we might be able to reduce the probability by replacing the dots in the names with another character or by removing them. It isn't clear that it's worth the effort.

Yeah, I'm worried about that too. One option might be to just make module names be actual string literals instead and then there is no translation of path separators to dots.

One way to disallow it is for libraries that don't explicitly specify a module name to be given a unique but opaque module name (that isn't necessarily expressible as a dotted name). No other libraries could be added to the module, but the names would never conflict.

I thought about that. That would mean that you are forced to write a single-library module's name explicitly if you want another module to friend it. I think that might be annoying if you just want to be able to write a white box test for a single library.

This appears to assume that libraries outside lib can't be part of the same module as those inside lib. While that makes sense, we might want to make that restriction explicit. For example, what about libraries defined in bin?

It isn't explicitly forbidden in the proposal to have a module that spans root directories. But my hunch is that users probably don't want them too. You likely don't want to have to compile your library and its tests as a single unit.

I'm guessing that an identifier from a friend module is only in scope if the library that declares the identifier is imported. If that's not the case (and maybe even if it is), consider making the requirement more explicit.

Yes. Added a paragraph clarifying.

But if that's true, then it raises a question. If a library La in A imports a library Lb in B, then Lb can't import La because that would violate the rule that modules can't have cycles. So what do you mean by "have access to each other's private identifiers"?

There are still cases without a cyclic import where one could access a private name the other. For example La could pass an instance of a subclass of a class defined in Lb that overrides a private method. But the real answer here is that I didn't put a lot of thought into that sentence and it's probably not as precise as it should be. :)

That has some implications for compilers. They're basically required to figure out the module structure before they can finish generating diagnostics for any given library, even if that library is the only library in a dependency cycle.

Ah, yes. I'm glad you brought this up. I am somewhat worried that the syntax I propose here is too lightweight and requires compilers to look at all of the files in a package before it can figure out what modules exist. None of this is carved in stone, so if this proves to be a problem, we can definitely iterate.

If module B is a friend of A, and module C is a friend of A, then it would be possible for code in B to access private names in C, but I think that could be very confusing to users when there's no explicit reference from B to C in either direction.

That's true, but I think it might be somewhat inescapable. I don't know if there's a fairly simple semantics for sharing privacy that doesn't have that property. My pitch is basically that privacy can be modeled as name mangling and that friend modules have the same mangling token. I think if that doesn't work transitively, then you get to the harder problem having to look at individual private identifiers and trying to figure out which module they belong to. That seems harder/messier to me, but it may be that I just haven't thought it through enough yet.

munificent commented 3 years ago

I don't buy the argument that modules simplify dependencies management for compilers, at least for the analyzer. We already build the graph of libraries, and already link element models of library cycles. And we already track which library cycles referenced by which, and discard element models for transitively affected library cycles (for Cider only currently). If things will go well, we might get symbol level dependencies. Module level is quite coarse-grained.

It is pretty coarse grained. We don't have to go in this direction. We could not do modules—we aren't committed to this approach yet. We still need a well-defined ordering from macros so that we can ensure that a macro has been compiled before it is applied. Maybe it would be sufficient to say that:

You can't apply a macro in the library where it is defined.
There can't be any import cycles between the library where the macro is defined and where it is applied.

But I worry that 2 would be really confusing, subtle, and/or brittle for users.

Why do we disallow multiple friends? One use case for multiple friends could be to have a single test library that is a friend with two implementation modules, and checks internals of both.

I'm not opposed to disallowing multiple friends, I'm just not sure what the semantics would be. Consider:

// a.dart
callAMethod(A a) {
  a._method();
}

interface class A {
  void _method() {
    print("A._method()");
  }
}

// b.dart
import 'a.dart';

callBMethod(B b) {
  b._method();
}

interface class B extend A {
  void _method() {
    print("B._method()");
  }
}

Here, A and B are in different modules, so their two _method identifiers are distinct private names. The _method() in B does not override or shadow the inherited _method() in A. Now consider:

// test.dart
import 'a.dart';
import 'b.dart';

library friend a, b;

class Test implements A, B {
  _method() { print("Test._method"); }
}

main() {
  callBMethod(Test()); // What does this print?
}

Since there are two private _method identifiers from two distinct modules, how do we know which one the _method in the test module is intended to refer to? This is probably tractable somehow, but it isn't immediately apparent to me how. Requiring friends to be a single-rooted tree I believe addresses this.

So, the question is why do we want to make it transitive? Especially, as @bwilkerson noticed, earlier version of the proposal requires modules to form DAG.

Transitivity falls out of the above semantics. It's not an explicit goal, but I think it leads to a simpler semantic model. I'm not super attached to this, but it was the first, simplest thing I could imagine working, so I wanted to at least write it down to start the discussion. :)

bwilkerson commented 3 years ago

Also, what will it mean if a user opts a given library out of module support?

Good question.

I suppose one answer is to say that it isn't in any module, so no other libraries can be in the same module and no modules can friend it. Imports and exports would have the same semantics as before. Modules are purely a compile-time construct, so there's no messy run-time semantics to worry about. Assuming that modules are released before or at the same time as macros, then we could define that macros can only be run in libraries that are part of a module without loosing any of the necessary semantics.

That would mean that you are forced to write a single-library module's name explicitly if you want another module to friend it.

I wonder how often that comes up in practice?

My pitch is basically that privacy can be modeled as name mangling ...

That's true, it can be, but it comes at the cost of requiring visibility to be bidirectional.

I'm aware that the other platforms implement library privacy that way today, but for what it's worth, the analyzer doesn't model it that way. For every reference to a private identifier it asks whether the identifier is visible in the current library.

Maybe it would be sufficient to say that:

You can't apply a macro in the library where it is defined.

There can't be any import cycles between the library where the macro is defined and where it is applied.

But I worry that 2 would be really confusing, subtle, and/or brittle for users.

That's fair. I guess I would expect a message similar to

The macro 'foo' can't be used here because the library 'bar.dart', in which 'foo' is defined, depends on this library.

'bar.dart' imports 'baz.dart' (bar.dart:12)
'baz.dart' imports 'foo.dart' (baz.dart:8)

Try breaking the import cycle, or removing the macro.

I'm probably not a good judge because I know too much about the proposed semantics, but it doesn't sound too confusing to me, especially if we document the reason for the restriction.

scheglov commented 3 years ago

In the example with two modules and _method the method in Test overrides both A._method and B._method just like it happens for public methods.

But the question about names is an interesting one. Currently in the analyzer we have Name that is basically a wrapper around a raw name. If the raw name is public, then Name is equivalent to this raw name. If the name if private, then hashCode and operator== also use the URI of the enclosing library. So, private methods in the same library can override each other, but in different libraries - not. But if you declare class A { void _method() {} } in a.dart, extend it in B in b.dart, and then again extend in class C extends B {} in a.dart, you still can access A._method in C - they are in the same library, even if through a different library. But if you ask Name('b.dart', '_method') from A - you don't get it, because it is Name('a.dart', '_method').

So, now instead of the library URI as a prefix in Name we will use the module name. All libraries of the module share the same module name, so see each other private names.

Not sure how to adapt this to friends. It seems that from visibility point of view, friends (if we consider it transitive) form a single module (meta-module?). So, maybe in Name we should use the name of this artificial friends-module, maybe a combination of names of comprising modules.

Levi-Lesches commented 3 years ago

Try breaking the import cycle, or removing the macro.

FWIW, I found it more shocking that you can use cyclic imports in Dart, so I don't think breaking that in some cases will be too impactful.

bwilkerson commented 3 years ago

It seems that from visibility point of view, friends (if we consider it transitive) form a single module ...

I hadn't thought of it that way, but you're right.

If friends are transitive then we have two ways to add a library to a module. Are there any differences between the two approaches?

So, now instead of the library URI as a prefix in Name we will use the module name.

I stand corrected; it looks like analyzer does mangle the name now.

munificent commented 3 years ago

In the example with two modules and _method the method in Test overrides both A._method and B._method just like it happens for public methods.

Maybe this is just my own intuition around the semantics, but I'd find that confusing. In B itself, there are two separate methods, at least according to current Dart semantics. So that would give you:

main() {
  callAMethod(B()); // "A._method"
  callBMethod(B()); // "B._method"
  callAMethod(Test()); // "Test._method"?
  callBMethod(Test()); // "Test._method"?
}

It would be surprising to me if a single method declaration could override two separate methods.

Not sure how to adapt this to friends. It seems that from visibility point of view, friends (if we consider it transitive) form a single module (meta-module?).

Yeah, "meta-module" is a way to think of it. My mental model is sort of that each module defines a unique token for name mangling. When one module declares itself a friend of another, it says "Use that module's name token for my own." (And, since that module may have also declared itself a friend of some third module, the token may get passed along. Thus you get a whole tree of modules all sharing the same name token as the root.)

I don't really like restricting modules to only one friend and having this kind of weird tree meta-module thing. I'll see if I can come up with something better. @leafpetersen's suggestion was to simply allow libraries to choose to include private names from imported libraries. Then you simply try to resolve the private name and if it resolves to an imported one, you use it. If you get a collision because you imported the same private name from two libraries, it's an error.

I'm not sure if that holds together with instance methods, though, where "resolve" isn't super well-defined. For example:

// a.dart
class A {
  _method() {}
}

callAMethod(A a) => a._method();

// b.dart
import 'a.dart' including private stuff;

class B {
  _method() => print("B._method()"); // (1)
}

class C extends B implements A {}

main() {
  callAMethod(C()); // ???
  callBMethod(C()); // ???
}

Here, on (1), we declare a _method in b.dart that is unrelated to the imported _method from a.dart. Presumably we wouldn't resolve it to be the same as a's because there is no reason to. But then, after the fact, class C inherits that method and wants to use it to implement A._method(). Does that work? Does this get weird?

munificent commented 3 years ago

FWIW, I found it more shocking that you can use cyclic imports in Dart, so I don't think breaking that in some cases will be too impactful.

From my analysis, about ~3% of libraries in the 2,000 most recent packages in pub are part of library cycles. That isn't a large fraction, but that's because things like tests (which are almost never acyclic) push the total down. About 1/3 of packages contain at least one cyclic import.

munificent commented 3 years ago

If friends are transitive then we have two ways to add a library to a module. Are there any differences between the two approaches?

The current proposal is that friends give you access to private names in other modules, but don't loosen the class capabilities. So if you want to extend a class in another library (without marking the class open) you still have to actually be in the same module and not just a friend.

scheglov commented 3 years ago

In the example with two modules and _method the method in Test overrides both A._method and B._method just like it happens for public methods.

Maybe this is just my own intuition around the semantics, but I'd find that confusing. In B itself, there are two separate methods, at least according to current Dart semantics. So that would give you:
main() {
  callAMethod(B()); // "A._method"
  callBMethod(B()); // "B._method"
  callAMethod(Test()); // "Test._method"?
  callBMethod(Test()); // "Test._method"?
}
It would be surprising to me if a single method declaration could override two separate methods.

This is what happens for public methods, and what happens for classes with private methods in the same library. For me it is seems logical that for friends all visibility related features work as they were the same module - not just access, but also overrides. In particular because access and overrides are the same feature - building the interface of a class.

Levi-Lesches commented 3 years ago

For me it is seems logical that for friends all visibility related features work as they were the same module - not just access, but also overrides. In particular because access and overrides are the same feature - building the interface of a class.

Would this discussion benefit "partial classes" (#678)? Previously, a few questions came up in discussion and in my head:

How would the compiler know when to stop looking for more partial declarations?
Where should the "canonical" declaration go? In other words, what do you import?
Which libraries should get to reference private members? What if each partial declaration has its own private members?

Modules would answer all of those questions: the partial class may be split between different libraries, but the class as a whole would belong to the module. Not to mention, both partial classes and modules are a huge step towards metaprogramming. Thoughts?

munificent commented 3 years ago

Would this discussion benefit "partial classes" (#678)? Previously, a few questions came up in discussion and in my head:

Probably worth spawning a separate issue to discuss this. :)

munificent commented 3 years ago

This is what happens for public methods, and what happens for classes with private methods in the same library. For me it is seems logical that for friends all visibility related features work as they were the same module - not just access, but also overrides.

But in the above example, A and B are not in the same module. If you were to show the mangled names, it's sort of like:

// a.dart
callAMethod(A a) {
  a._module_a_method();
}

interface class A {
  void _module_a_method() {
    print("A._method()");
  }
}

// b.dart
import 'a.dart';

callBMethod(B b) {
  b._module_b_method();
}

interface class B extend A {
  void _module_b_method() {
    print("B._method()");
  }
}

So the v-table for B looks like:

B {
  _module_a_method() => inherited from A
  _module_b_method() { print("B._method()"); }
}

Then you define:

// test.dart
import 'a.dart';
import 'b.dart';

library friend a, b;

class Test implements A, B {
  _method() { print("Test._method"); }
 // ^^ Is this _module_a_method(), _module_b_method(), or both?
}

It seems strange to me that a single declaration would override two essentially different method names.

Levi-Lesches commented 3 years ago

@munificent, Thanks, I opened #1753

insinfo commented 3 years ago

If I understand this discussion, wouldn't it be better to use "namespaces" and partial and sealed classes instead of modules?

munificent commented 3 years ago

Those words all mean sort of different things in other languages. "Namespaces" is another term we could consider instead of "modules" but the former I think usually implies there are no real meaningful semantics. With modules, if we use them for things like separate compilation, then I think "namespace" would be a confusing term.

Partial classes are a specific feature: they let you break the definition of a single class into multiple separate textual descriptions across multiple files. That's an interesting feature, but not what's proposed here. Sealed classes are part of what we're discussing.

cedvdb commented 1 year ago

Modules is about encapsulation, name space strikes me as, well, a namespace.

// a.dart
callAMethod(A a) {
  a._method();
}

interface class A {
  void _method() {
    print("A._method()");
  }
}

// b.dart
import 'a.dart';

callBMethod(B b) {
  b._method();
}

interface class B extend A {
  void _method() {
    print("B._method()");
  }
}

Here, A and B are in different modules, so their two _method identifiers are distinct private names. The _method() in B does not override or shadow the inherited _method() in A. Now consider:

// test.dart
import 'a.dart';
import 'b.dart';

library friend a, b;

class Test implements A, B {
  _method() { print("Test._method"); }
}

main() {
  callBMethod(Test()); // What does this print?
}

this should print "Test._method", callBMethod(B b) receive a Test which has overriden the method. What's the issue here, friend should have no impact ?