chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 422 forks source link

Should we be able to extend a module from outside the module? #10796

Open lydia-duncan opened 6 years ago

lydia-duncan commented 6 years ago

A quick search wasn't showing me an issue for this, though I know it's been mentioned in passing before by at least @buddha314.

Today, a module can only contain symbols defined within its braces. Other languages provide the ability to extend namespaces, but Chapel hasn't typically done so. Is this something we should provide?

Potential syntax: Similar to how secondary methods can be defined using the type name as a prefix, we could use the module name as a prefix when defining a function. For instance:

module M { ... }

proc M.additionalFunction() { ... }

module M.M2 { ... } // submodule

We would want to carefully consider the impact of this on private symbols - my thinking is that symbols defined outside the original scope of the module cannot access symbols that are private to the module, unless such private symbols are defined at a similar scope, e.g.

private proc M.func1() { ... }
proc M.func2() { ... } // can call func1()

However, that may remove some of the desire for this feature.

bradcray commented 6 years ago

At the risk of revealing myself to be the caveman that I am: What's a motivating use case for this feature?

lydia-duncan commented 6 years ago

What's a motivating use case for this feature?

Very large programs, if I understand it correctly. Basically, a program has become large enough that you want to spread it across multiple files, but you still want all of its contents to get included via a single use.

lydia-duncan commented 6 years ago

Note that you can get that behavior somewhat today due to our transitive use statements. However, this is understandably a little gross to some people.

buddha314 commented 6 years ago

If you take a look at a package like SciPy you see that it's spread out over a lot of different sub-packages. in python, you can import scipy or from scipy import stats.distributions, so you have the option of which level to import. I interpret this ticket to center on that kind of use.

lydia-duncan commented 6 years ago

(In case it's not clear by me putting this in the Icebox, I don't consider this an urgent issue by any means. I was mostly logging it because I remembered it during a conversation)

bradcray commented 6 years ago

Basically, a program has become large enough that you want to spread it across multiple files, but you still want all of its contents to get included via a single use.

If the only goal is to break a module across multiple files, I'd prefer to do so via an equivalent to C's #include / LaTeX's input such that a module could be broken up across multiple files via:

module M {
  proc foo() { ... }

  include "bar.chpl";  // defines a procedure bar()
  include "baz.chpl";  // defines a procedure baz()
}
bradcray commented 6 years ago

If you take a look at a package like SciPy you see that it's spread out over a lot of different sub-packages. in python, you can import scipy or from scipy import stats.distributions, so you have the option of which level to import. I interpret this ticket to center on that kind of use.

Once you say "subpackages" that implies to me something like hierarchical structure (say, modules defined within other modules), but I don't read this issue as relating to that question at all. It seems more related to defining something outside of a module's scope than it does to how multiple modules relate to one another (?).

bradcray commented 6 years ago

(Since @BryantLam and @LouisJenkinsCS gave this a thumbs-up, I'm curious about their answers to "What's a motivating use case for this feature?" too...)

lydia-duncan commented 6 years ago

I'd prefer to do so via an equivalent to C's #include / LaTeX's input

I'd worry a little that we would have too many strategies with that plus use, plus require but don't have any other objections to it

bradcray commented 6 years ago

I'd worry a little that we would have too many strategies with that plus use, plus require but don't have any other objections to it.

These all seem to have very different roles to me:

Specifically, I don't think any of these can be used to do what the others are intended for, so don't view them as distinct strategies for achieving a given thing.

bradcray commented 6 years ago

I filed issue #10909 to capture the desire for an include statement (it predated the issue tracker and I don't believe ever got created as an issue).

BryantLam commented 6 years ago

I'm okay with either this or include statements (#10909). I just want some way to express a large amount of code without constraining it to a single file. Some of this code will also be implementation-private (#10800) to the package.

When given a choice, I'm in favor of whatever is more explicit, provided it doesn't lead to excessive boilerplate. #10909 seems okay to me but I'm already accustomed to writing #include statements in C/C++.

BryantLam commented 6 years ago

One advantage to extending the module is that it makes the module hierarchy in each file apparent and clear. If I have a package:

$  cat A.chpl
module A {
  include "B1.chpl"
}

$  cat B1.chpl
proc getTime() { ... }

What gets included? How does the file-level module affect the include statement? (Does it?) Am I allowed to use getTime() from elsewhere in the program from the global namespace? Would the compiler need a way to say, "Hey, this file was included somewhere. Don't add it to the global namespace." and could it reliable do that if B1.chpl was parsed before A.chpl?

Whereas the alternative:

$  cat B2.chpl
module A {
  proc getTime()
}

has no ambiguity.

... Though this does muck up the idea of file-level modules and why implicit behavior can constrain language design (or put more cognitive burden on a user).

bradcray commented 6 years ago

but I'm already accustomed to writing #include statements in C/C++.

Me too, so I may share any biases here... I realize that your questions may be partially rhetorical, but to try and answer them:

What gets included?

The idea in issue #10909 would be that the include would be replaced by the literal text of B1.chpl, as with the C pre-processor's #include directive. Thus, the two alternatives you wrote would behave identically.

In answering the other questions, I'll be addressing a few potential misunderstandings that I've been mildly worried about (from side conversations) lately:

How does the file-level module affect the include statement?

Don't think of Chapel as "introducing an automatic module at every file scope." Rather, think of it as "If a file that's been supplied to the compiler contains top-level code other than comments and module declarations, then an implicit file-scope module declaration is introduced to contain that code.

w.r.t. the interaction with the include statement, I'd say that the compiler would never insert an implicit file-scope module for an included file. Thus, if A.chpl had instead said:

module A {
}
include "B1.chpl";

you'd get a file-scope module named A containing a procedure called getTime() as well as ann empty sub-module. I.e., it would be equivalent to:

module A {
}
proc getTime() { ... }

from the global namespace

You've used this phrase a few times recently which concerns me that there may be a misunderstanding. I believe that the only global namespace Chapel has is the one that defines the names of all the top-level modules, and for this reason, I tend not to use the phrase "global namespace" w.r.t. Chapel programs (in fact, I try to avoid using "global" at all, though not always successfully...). What do you mean when you use the phrase? Does it show up in documentation somewhere?

To attempt to answer the question, I'm assuming that if B1.chpl was named on the compiler command-line, it would introduce a module named B1 which contained getTime() which would be available to any module that used B1. But if the file was only included, then the routine would only be available via the modules in which it was defined.

bradcray commented 6 years ago

To state my reservations for the feature request proposed in this issue: Since all Chapel code is defined within the context of some module, code like the following makes me nervous:

module Outer {
  module Inner { ... }

  proc Inner.additionalFunction() { ... }
}

Specifically, my intuition upon reaching the proc declaration is that it's defined within module Outer since that's the scope in which it appears. Yet the Inner modifier presumably means that it isn't actually defined within Outer, it's defined within Inner. This seems confusing to me.

It also seems like it introduces a bit of an ambiguity to the reader since seeing a declaration in isolation:

proc Foo.bar() { ... }

It's hard to tell whether this is adding a secondary method to a class/record Foo, or a standalone procedure to a module named Foo.

lydia-duncan commented 6 years ago

It's hard to tell whether this is adding a secondary method to a class/record Foo, or a standalone procedure to a module named Foo.

But the same can be said about making a call to bar in a complex program (where the definition of Foo is far enough away that the answer to the question "what is Foo?" is not obvious). We already use the same syntax to access "a function defined in another module" and "a method on an instance". It seems like allowing their definitions to follow the same pattern as well is removing a special case.

bradcray commented 6 years ago

It seems like allowing their definitions to follow the same pattern as well is removing a special case.

I have the opposite reaction—that it would be adding a special case.

While it's not always obvious what specific routine a call myObject.baz(myArgs) is invoking without good tools or detective work, that seems to be part and parcel of OOP as far as I can tell, and a frequently cited downside of it. It feels like something to try and avoid creating more instances of, rather than replicating it in additional contexts.

However, it also seems different to say "I understand that myObject.baz(myArgs) is a method call, but I don't know which specific method definition is being called" than to look at a declaration and not even be able to determine whether it's defining a method or a standalone routine. For instance, given:

proc Foo.bar() { ... }

if Foo is a module I'd call it as Foo.bar() or:

use Foo;
bar();

Whereas if Foo is a type, I'd call it via:

var myFoo = new Foo();
myFoo.bar();

This seems like it's adding a new flavor of syntactic ambiguity (and, in my mind, unnecessarily, due to the lack of motivating use cases and its other drawbacks like my mental churn around "I'm defining this procedure in module Bar but it's not actually a part of module Bar").

Maybe put a different way: Currently we don't have a way of "injecting" new code into a module from outside of that module and I'm not convinced that we should add such a capability because I think it adds complexity / mental churn, both to the interpretation of Chapel programs, and to its implementation (e.g., I'm anticipating what it will do to the resolution rules if I use module Foo but don't use the module that defines the Foo.bar() procedure, particularly in the presence of overloading).

Taken to the extreme, would we want to support things like:

module M { 
}

module M2 {
  config const M.verbose = true;       // add a config const, not to this module, but to M
  class M.C { ... }                    // add a class definition not to this module, but to M
  enum M.color { red, green, blue };   // add an enum not to this module, but to M
}

My reaction is "ugh, no way" because it feels like this is going the opposite direction of well-structured programming (granted, the include statement proposal is not particularly structured either, but there's a strong precedent for it, at least within languages that I've used the most. And I think it's more of a meta-programming feature by nature / definition).

lydia-duncan commented 6 years ago

Currently we don't have a way of "injecting" new code into a module from outside of that module and I'm not convinced that we should add such a capability because I think it adds complexity / mental churn, both to the interpretation of Chapel programs, and to its implementation (e.g., I'm anticipating what it will do to the resolution rules if I use module Foo but don't use the module that defines the Foo.bar() procedure, particularly in the presence of overloading).

But we get that behavior with secondary methods defined in modules outside of where the type is originally defined, so the machinery is likely already there. But maybe that argues that we made a mistake in allowing secondary methods to be defined in that way?

I'm mostly just exploring the thinking, I don't know if I actually think we should do this.

bradcray commented 6 years ago

But we get that behavior with secondary methods defined in modules outside of where the type is originally defined, so the machinery is likely already there.

I'd argue that such cases inject the secondary method into the type itself (as potentially governed by the module in which the secondary method is defined), not into the module defining the type. Looking at a concrete example:

module M1 {
  var g1: int;
  class C { ... }
}

module M2 {
  var g2: int;
  proc C.foo() {
    ...g1...  // illegal without a `use` of `M1` or `M1` prefix because C.foo is defined in module M2
    ...g2...  // legal, through normal lexical scoping
  }
}

I'd consider foo() to be defined in terms of the scopes ofC and M2 (leading to questions like the ones I think you've posed recently about whether I have to use M2 in order to call myC.foo()). I wouldn't consider foo() to be defined within the scope of M1. Practically speaking, I don't think it should be legal for C.foo() to directly refer to g1 from M1 without a use of M1 or via M1.g1. But I do think it should be able to refer to g2 since it's defined within M2.

Conversely, if we permit code to be injected into another module, my assumption is that accesses to globals would behave as follows:

module M1 {
  var g1: int;
}
module M2 {
  var g2: int;
  proc M1.foo() {
    ...g1...  // legal because foo() is actually being defined in M1
    ...g2...  // illegal because foo() is actually being defined in M1 so isn't within the lexical scope of M2
  }
}

This just seems weird to me, not to mention challenging to implement (we have enough scope resolution problems as it is).

BryantLam commented 6 years ago

Your syntax choice in the recent examples is a little weird to me. I would expect something more like:

module M {
  var g1: int;
}

module M {
  var g2: int;
  proc foo() {
    ...g1... // legal
    ...g2... // legal
  };
}

and would disallow extensions via definition-through-module-prefix. I don't consider modules/namespaces to be similar at all to types, so the syntax choices between both don't have to be similar either.

Edit: I just realized the original post is what you were arguing against. I think I agree; it's a little extreme whereas this post is more of an easy lift (without consideration for file-resolution concerns).

bradcray commented 6 years ago

I just realized the original post is what you were arguing against. I think I agree

I'm counting the change of your thumbs-up emoji into a confused emoji as my victory for the day. :)

If I'm understanding your example correctly:

module M {
  var g1: int;
}

module M {
  var g2: int;
  proc foo() {
    ...g1... // legal
    ...g2... // legal
  };
}

I think you're saying that a Chapel program should be able to define multiple top-level modules with the same name and that their contents should all be unioned into a single module with that name? This seems pretty weird to me, though it does address my two main concerns about this issue's proposal. What are the motivations for it? Is there precedent for it in other languages?

Trying to explain why I think it's weird: Let's say you and I are developing two modules independently and they just happen to have the same name. It seems odd that they would get merged into one thing since they're logically independent pieces of code. And it seems potentially dangerous in that they may interfere with one another once merged (What if they both define a top-level config or type with the same name? What if they each provide overloads that are problematic in the presence of the others'). Instead, I'd expect this case to generate a "Hey, you have two top-level modules of the same name -- do something about that!" type of error, similar to what should happen when declaring two variables or two classes of the same name at the same scope.

One other more minor technical concern is what the behavior of the following should be:

module M {
  writeln("In first module M!");

  proc deinit() {
    writeln("Tearing down first module M!");
  }
}

module M {
  writeln("In second module M!");

  proc deinit() {
    writeln("Tearing down second module M!");
  }
}

where I suppose one answer is "both modules may not define top-level executable code / functions with the same name, including deinit()." (so this would be an error not because there are two module M's but because both try to define the same function(s).

lydia-duncan commented 6 years ago

I don't think this code is an accurate comparison:

module M1 {
  var g1: int;
  class C { ... }
}

module M2 {
  var g2: int;
  proc C.foo() {
    ...g1...  // illegal without a `use` of `M1` or `M1` prefix because C.foo is defined in module M2
    ...g2...  // legal, through normal lexical scoping
  }
}

In your module example, you're accessing symbols at the scope that is being extended. In this example, you're trying to access symbols at the outer scope of the scope that is being extended. I don't think you'd argue that C.foo shouldn't be allowed to see C's fields, or methods defined on C in the original location. Similarly, extensions to the module should be allowed to see its globals and other functions. In this argument, I'm treating the module like a singleton class (fields easily correspond to globals at the module scope, and module level functions to methods, especially when referred to from outside the module scope).

lydia-duncan commented 6 years ago

In proposing the strategy I did at the beginning of this thread, I was not intending to eliminate strategies like C++'s namespaces. The overall strategy is definitely separable from whether we should allow any way of extending a module that has already been defined (perhaps it should have been split off into two separate threads, though I suspect the discussion would have taken place on one anyways). The similarities between modules and singleton classes made me wonder if strategies that were already in place should be extended.

bradcray commented 6 years ago

I don't think this code is an accurate comparison:

Oh, I think I misunderstood your previous analogy to secondary methods then. I think I understand it better now that you've made the connection to singleton classes.

Given that perspective, how about this class-based analogy in which we try to define a secondary method on one class from within a distinct class:

class C {
  var x: int;
  proc foo() { ... }
}

class D {
  var y: int;
  proc C.bar() {  // inject a method `bar()` into C
    ...
  }
}

This is something we don't currently support, and I think this was a good choice (e.g., it leads to questions about whether C.bar() can refer to y since it's defined within D and seems to be able to access it lexically). The module case seems similarly weird to me: One module is trying to define a procedure not within itself, but within a completely different module.

[addendum: And while it's possible to write code outside of any classes to add a secondary method, it's not possible to declare Chapel code that exists outside of any modules (currently at least... and I'd be reluctant to change that)].

bradcray commented 6 years ago

I also want to get back to motivating examples, though. Without them, I feel like we've spent a lot of time on a reasonably esoteric topic without a concrete "I'd like to use it for this" use case in hand to motivate the discussion. If the only motivation is to be able to break a module into multiple files, then I think the include-based approach has the advantages of being more powerful, general, clear, and precedented.

lydia-duncan commented 6 years ago

I think that's reasonable :) I don't have a more specific example myself, so I think we can fall back on Bryant's earlier comment that either strategy would work for him (so we can go with the include approach and maybe close this issue), unless @LouisJenkinsCS or @buddha314 had an additional set up to add?

ben-albrecht commented 6 years ago

If you take a look at a package like SciPy you see that it's spread out over a lot of different sub-packages. in python, you can import scipy or from scipy import stats.distributions, so you have the option of which level to import. I interpret this ticket to center on that kind of use.

Once you say "subpackages" that implies to me something like hierarchical structure (say, modules defined within other modules), but I don't read this issue as relating to that question at all. It seems more related to defining something outside of a module's scope than it does to how multiple modules relate to one another (?).

I think the original motivating use-case does want something hierarchical like submodules. I believe submodules would work for @buddha314's example of SciPy with the exception of submodules being constrained to a single file (SciPy is ~270k lines of python code, for reference).

From what I can tell, an include statement would be general enough to remove this constraint, e.g.

// Top.chpl
module Top {
  include "A.chpl"
  include "B.chpl"
}
// A.chpl
module A {
  var x = 1;
}
// B.chpl
module B {
  var x = 2;
}
// User code
use Top;

writeln(Top.A.x); // 1
writeln(Top.B.x); // 2

A mechanism developed specifically for defining submodules in separate files might look cleaner in the end (using module names like A instead of file names like "A.chpl", for instance), but that comes at the cost of even more keyword/syntax pollution, so it's probably not worth exploring.

mppf commented 5 years ago

I'm not up to date on every detail in this issue, but I think that #13524 combined with #13979 addresses the need adequately. The difference with that combination and what this issue is originally proposing is that the submodule/re-export assumes that the contents of a namespace (i.e. module) are determined by the author(s) of that module; while in this issue those contents could be extended arbitrarily. However both would allow splitting up large projects into many files.