mutually recursive modules and resolution order

chapel-lang / chapel

a Productive Parallel Programming Language

https://chapel-lang.org

Other

1.77k stars 415 forks source link

mutually recursive modules and resolution order #19770

Open mppf opened 2 years ago

mppf commented 2 years ago

This issue asks language design questions about mutually recursive modules and the order of resolution of modules.

First, we have this example that demonstrates that the order of use statements does matter:

// Example A
module Library {
  module SubModule {
    var str = "str from submodule";
  }
}

module Program {
  // this does not currently compile, but swap these two statements and it does
  use SubModule;
  use Library;

  proc main() {
    writeln("str=", str);
  }
}

(This example comes from https://github.com/chapel-lang/chapel/issues/13041#issuecomment-873393725 ).

Next, let's consider an example that compiles and runs today of mutually recursive modules:

// Example B
module M {
  use N;
  var x = y;

  proc main() { }
}

module N {
  use M;
  var y: int;
}

The language specification doesn't explicitly say that this is or is not allowed. However it is currently supported.

Now what are the things that need to happen in order to resolve a module?

need to know what 'use' / 'import' statements refer to
need to resolve module-level code (including types and function calls)

Now, at the moment, I am expecting the compiler will do step (1) in an in-order manner for a given module; and then do (2) in an in-order manner for a given module. This strategy would continue to allow mutually recursive modules.

For step 2, is it OK to require that it is possible to create an order for resolving the modules? The code in Example B meets this requirement because the compiler can completely resolve module N and then from there it can completely resolve module M.

Here is an example where that is not true:

// Example C
module M {
  use N;
  var z: int;
  var x = y;

  proc main() { }
}

module N {
  use M;
  var y = z;
}

The production compiler currently produces an error for this:

exc.chpl:11: error: use of 'z' before encountering its definition, type unknown

However, at the moment, the dyno compiler can resolve it. Should such patterns be allowed by the resolver?

bradcray commented 2 years ago

The language specification doesn't explicitly say that this is or is not allowed

I'll have to look, but I could have sworn that something did talk about this being allowed and defining the order. Specifically, I remember writing some text at some point about using a graph DFS/BFS algorithm to establish module initialization order (which should define what happens in the event of cycles).

However, at the moment, the dyno compiler can resolve it. Should such patterns be allowed by the resolver?

I think we've always hoped that the compiler would do a better job of resolving these than it currently does (where we imagined that, when it got stuck, it would put that down and work on other things until it figured out; or just actively go resolve the thing that it can't resolve before continuing), so this sounds like good news to me. Though of course there will always be cases that can't really be resolved, like:

module M {
  use N;
  var x = y;
}

module N {
  use M;
  var y = x;
}

(and you could imagine more complicated / realistic examples in which the initializations were using functions that referred to the variables in cyclically-dependent ways). But I think that's OK because such programs arguably aren't really sensical/well-formed. So it seems like we'd still need some way of saying where the line between what can or can't be resolved is.

mppf commented 2 years ago

The language specification doesn't explicitly say that this is or is not allowed

I'll have to look, but I could have sworn that something did talk about this being allowed and defining the order. Specifically, I remember writing some text at some point about using a graph DFS/BFS algorithm to establish module initialization order (which should define what happens in the event of cycles).

Yes, it describes module initialization order, but not module resolution order. And AFAIK the section defining module initialization order doesn't explicitly say anything about cycles.

However, at the moment, the dyno compiler can resolve it. Should such patterns be allowed by the resolver?

I think we've always hoped that the compiler would do a better job of resolving these than it currently does (where we imagined that, when it got stuck, it would put that down and work on other things until it figured out; or just actively go resolve the thing that it can't resolve before continuing), so this sounds like good news to me.

One challenge we are facing with the dyno query framework is that a query calling the same query recursively (most likely through other functions/queries) is a fatal error. Basically there isn't a reasonable way to handle that in the query framework other than giving up with an error. Or at least that is what we are led to believe by the Rust compiler.

So, at the moment such a (nonsense) program kills the dyno compiler with an error "Error: recursion encountered in query resolveModuleStmt". (It would halt, even if used in a library, because by then the program database is corrupted and we don't have a way to revert it). One of the things we have been wrestling with is how hard to work to improve this situation, where some ideas include:

adjust the language to make the error cases easier to detect (which is what this issue is asking about)
simply improve the error message (but still halt)
or try to do some analysis ahead-of-time to emit an error before getting to the fatal query recursion error.

Perhaps, the fatal error is good enough for now.

bradcray commented 2 years ago

Yes, it describes module initialization order, but not module resolution order.

Oh, good point, I was blurring those together in my mind. I'm guessing that the two are probably the same in the current compiler, though (?).

so this sounds like good news to me.

A challenge that's occurring to me belatedly now that you've clarified the above. Even if dyno can resolve your example 3, the modules can't be completely initialized in any specific order, right? I.e., if I do M first, it relies on N.y, which hasn't been set up yet, so that's an error. And if I do N first, it relies on M.x, which hasn't been set up yet. So if the implication is that the only way to initialize this program is to have fine-grain module initializers that are run in some interleaved manner... maybe that's more complicated than we want to worry about and an error really is the correct behavior for these initializations depending on mutual recursion. (note that I think mutual module uses should definitely still be supported when the initialization order doesn't result in these sorts of cycles, though).

the section defining module initialization order doesn't explicitly say anything about cycles.

It looks like it does to me. It talks about using a depth-first traversal, and not visiting modules that have already been visited. So in your example C, I think it would start with M as the main module, go to N via the use N, and then avoid going back to M due to the use M (since M was already visited) breaking the cycle (and I think this is how DFS is defined on graphs with cycles in general, so am not trying to imply we're being particularly clever or creative here). And then the post-order implies that N would be initialized first, then M. But then we run into the resolution / mutually recursive initializers issue from my previous paragraph.

dlongnecke-cray commented 2 years ago

That is so fascinating that in dyno we can resolve something that's not actually runnable, because the module initializer body must be run all at once. So there's no point in being able to resolve "Example C" even if we can do so.

Following that train of thought, it seems like the issue here is not the ability to resolve the module all at once, but to run it all at once (where the former might follow trivially if the latter is true - such that it helps to simplify our implementation)?

Can we just add a line to the spec saying something like "the statements in a module must be written such that all statements in a module initializer can be executed together in order"?

mppf commented 2 years ago

A challenge that's occurring to me belatedly now that you've clarified the above. Even if dyno can resolve your example 3, the modules can't be completely initialized in any specific order, right?

I believe so, yes.

maybe that's more complicated than we want to worry about and an error really is the correct behavior for these initializations depending on mutual recursion.

I agree with this.

Would it be possible to unify the initialization and resolution orders? So that the language specification can describe both orders at once?

What we have now is this from https://chapel-lang.org/docs/language/spec/modules.html#program-execution :

Starting from the module that defines the main procedure, the modules named in its use and import statements are visited depth-first and initialized in post-order.

Some wrinkles with that:

While it might be possible to imagine the initialization order working out with separate compilation (where each module tries to initialize its dependencies, and skips them if they are already initialized, and proceeds from the main module), can that same approach work when resolving. Is there even a main module when compiling a library? I think that apart from the starting point, this ordering description could describe the resolution process even for libraries; each module is skipped if already resolved and otherwise resolves its dependencies.
the module initialization order section doesn't consider modules inited due to being named on the command line.
it seems (from some sample programs) that the "the modules named in its use and import statements" includes module use/imports that are inside of functions (even if those functions are not ever resolved). But I don't think that interpretation would be obvious to a reader.

the section defining module initialization order doesn't explicitly say anything about cycles.

It looks like it does to me. It talks about using a depth-first traversal, and not visiting modules that have already been visited.

Right, it describes an algorithm that can work with cycles, but doesn't say cycles are or are not allowed. I think it should say that they are allowed, somewhere, if that is our intention.

bradcray commented 2 years ago

the module initialization order section doesn't consider modules inited due to being named on the command line.

I was wondering about this when the Arkouda team was asking questions about this a few weeks back. Specifically, I found myself hoping that any "named on command line" modules would be init'd after the others and in the command-line order, but wasn't sure whether that was the case.

includes module use/imports that are inside of functions (even if those functions are not ever resolved). But I don't think that interpretation would be obvious to a reader.

That's correct due to the fact that, in today's compiler, we resolve use/import early (long before resolution) and don't do enough bookkeeping to back things out if something doesn't end up being used. I think it would be a definite improvement to the language and user experience if only those that were in "live" code resulted in initialization if it fell out due to the dyno resolution approach. If it didn't, I think I'd just stick with the current approach (where I always imagine we have some text somewhere that talks about "all uses/imports in the module's source in lexical order" or something like that, but would not be surprised if we didn't say anything at all).

I think it should say that they are allowed, somewhere, if that is our intention.

That's fine with me. I also think it's definitely our intention.

mppf commented 2 years ago

I think it would be a definite improvement to the language and user experience if only those that were in "live" code resulted in initialization if it fell out due to the dyno resolution approach.

Can you say more about what you mean by "live" code, here? (In particular, what if the use/import is within a concrete function? What if it is within a generic function?)

bradcray commented 2 years ago

By "live" I meant code that was reachable through the resolution process, however that evolves. E.g., reachable through main or export routines in today's compiler; those things or concrete (or generic with a finite number of options?) in a more aggressive resolution scheme.

mppf commented 2 years ago

What about the order of resolving use/import statements themselves, in the face of recursive modules? As we know from Example A, the order matters.

Here is an example that compiles and runs today (with the production compiler):

module A {
  public import C as CC;
  public import D as DD;
  use DD; // DD resolves through the above import
  use CC; // CC resolves through the above import

  var a1 = cvar;
  var a2 = dvar;

  proc testA() {
    writeln("testA ", a1, a2);
  }

  proc main() {
    import A;
    import B;
    A.testA();
    B.testB();
  }
}

module B {
  public import D as DD;
  public import C as CC;
  use CC; // CC resolves through the above import
  use DD; // DD resolves through the above import

  var b1 = cvar;
  var b2 = dvar;

  proc testB() {
    writeln("testB ", b1, b2);
  }
}

module C {
  var cvar: int = 1;
}
module D {
  var dvar: int = 2;
}

Here is a minor adjustment to it that does not compile with the production compiler (but that is due to issue #19799). Here I am trying to ask if it should compile:

module A {
  use B;
  public import D as DD;
  use DD; // this comes from the above import
  use CC; // this comes from B's import through the 'use B'

  var a1 = cvar;
  var a2 = dvar;

  proc testA() {
    writeln("testA ", a1, a2);
  }

  proc main() {
    import A;
    import B;
    A.testA();
    B.testB();
  }
}

module B {
  use A;
  public import C as CC;
  use CC; // this comes from the above import
  use DD; // this comes from A's import through the 'use A'

  var b1 = cvar;
  var b2 = dvar;

  proc testB() {
    writeln("testB ", b1, b2);
  }
}

module C {
  var cvar: int = 1;
}
module D {
  var dvar: int = 2;
}

Let's say it starts out by resolving the use/imports in A (since it is the main module). The compiler:

resolves 'use B' to the toplevel module 'B'
to resolve 'public import D as DD', it has to ask, is there a module D made available by 'use B'? Or should it find a toplevel module, if not? (see also #19782).
to find out if 'use B' brings in a 'D', it goes to resolve use/import statements in 'B'. But the first thing in 'B' is a 'use A'. What do do?

We could suppose that it should work with the current knowledge in 'A' (the partial resolution result). That is what I have tried to implement in dyno, but it comes at some cost (it is technically O(n*n) in the number of use/import statements). Should we instead require that there exists some order where we can visit each module in that order and correctly resolve the use/imports without stopping to work on a different module? (I am leaning towards doing so).

mppf commented 2 years ago

Here is another (similar) example that doesn't compile today but that we could arguably make compile with sufficient effort:

module M { 
  use N;
  use NN;
  use OO;
  x;
  z;
} 

module N { 
  module NN { 
    var x: int;
  } 
  public use O;
} 

module O { 
  use M;
  module OO { 
    var z: int;
  } 
}

mppf commented 2 years ago

What I have cooking for now is to, during process of resolving module-level use/import statements, simply ignore the use/import statements within any module that we are already in the process of resolving (i.e. the recursive use). I suspect that is good enough to support mutually recursive modules for now. I think the result is better than the production compiler, at least (because the production compiler has the problem in issue #19799).