Improve spec description of what being in scope means

lydia-duncan commented 6 years ago

I was having difficulty finding a section to describe our current scoping behavior (beyond the use statement description, which doesn't cover what Bryant was encountering in #9985). What is there is kinda of scattered and doesn't really go into the implications of those rules. It would be nice to have either a spec chapter or a technote describing our rules on scoping in a single place, as well as the implications for symbols in the same scope. This would include clarification of things like the behavior of:

proc main() {
  var a = 12;
  {
    var b = a;
  }
  {
    var a = 1;
    var b = a;
  }
  {
    var b = a; // intended to fail
    var a = 1;
  }
  {
    var b = a;
    {
      var a = 1;
    }
  }
}

as well as the interaction with functions, module symbols defined at the same scope, etc.

lydia-duncan commented 6 years ago

@bradcray, @BryantLam, does this cover it accurately? Feel free to add more or suggest changes

bradcray commented 6 years ago

To fix the specific problem of local variables that spawned this issue, the following text in Variables.tex seems like a good place to mention that a local variable shadows outer variables of the same name throughout the local block, not just after its declaration point:

Local variables are declared within block statements.  They can only
be accessed within the scope of that block statement (including all
inner nested block statements and functions).

If this were followed by an example showing the case of a shadowing a, that would be pretty clear.

I'd have to look harder to see what we might need to do to define scoping better for other things (modules, functions, etc.). I think it'd probably make more sense to define these on a per-symbol-type basis (modules vs. functions vs. types vs. variables) rather than unify it all into a single section or chapter, since different things have different behaviors; but I could be wrong about that.

bradcray commented 6 years ago

Belatedly: The "Lexical Structure" section would be an obvious place to put a brief note about Chapel trying to take an "order doesn't matter" approach to many symbols such that a symbol is available from anywhere within a scope regardless of order of declaration. From there, it could point to specific sections in other chapters that go into more detail for specific symbol types.

BryantLam commented 5 years ago

Between this issue and related issues #12744 and #9985, I wasn't sure where to put this question.

Why does fn3 behave differently? Aren't all three declarations considered in-scope for their respective functions? (I'm not advocating that the behavior of fn3 should be changed to be legal.)

module A {
  var x = 42;
}

// #12744
proc fn1() {
  writeln(x);
  use A;
}

proc fn2() {
  writeln(y());
  proc y() { return 42; }
}

// #9985
proc fn3() {
  writeln(z); // error: 'z' used before defined
  var z = 42;
}

proc main() {
  fn1(); fn2(); fn3();
}

bradcray commented 5 years ago

Hi Bryant —

This is a good question and seems like a fine place to ask it. The consistency between fn3() and fn1() and fn2() is that its reference to z refers to the local z even though its declaration hasn't yet been encountered. The arguable inconsistency (assuming I'm interpreting your question correctly) is that use statements affect their whole scope before execution has reached that line and functions can be called before their declaration is reached, but variables can't be used until their declaration has been reached and they've received an initial value. I'd say that the difference between these cases is that the first two only have declaration semantics (use says what symbols are available within a scope; proc defines a function but neither has any execution-time semantics related to the declaration itself) whereas variable declarations do have execution-time semantics (I need to run the initializer expression before the variable makes any sense).

When we were defining this, other options we considered were (a) to default initialize the variable at the beginning of the scope and then re-assign it when its declaration was reached; or (b) to have references to the variable prior to its declaration refer to an uninitialized variable. But those both seemed pretty clearly broken / suboptimal. Of course another option would be to take more of the C-style approach where z refers to an outer z until the local declaration is reached, but we didn't like that this was inconsistent with how scope resolution was done for other symbols and felt that the errors that would be generated in the variable case would prevent users who were assuming the C model from shooting themselves in the foot and give us a hook to teach them how Chapel was different in this respect.

I hope this explanation is helpful / seems reasonable and consistent.

BryantLam commented 5 years ago

Why does fn4 and fn5 behave differently than a proc?

// Chapel 1.19.0

proc fn4() {
  writeln(a); // error: 'a' used before defined
  param a = 42;
}

proc fn5() {
  writeln(T :string); // error: 'T' used before defined
  type T = int;
}

lydia-duncan commented 5 years ago

I think the reason is more the spirit of what Brad was writing, rather than the letter of it. consts also do not have their value change throughout the lifetime of a scope, but we still expect them to be defined before they are referenced.

There's a number of reasons why it is good to have use statements and proc definitions impact anywhere in the scope. For procs, at least, it allows the proc to reference itself within its body, without forcing the user to write a prototype for the function that serves no other purpose. It wouldn't make sense to do that for variables, constants, types, or params.

lydia-duncan commented 5 years ago

If I remember correctly, part of the benefit of having use statements impact the entirety of the scope is that it doesn't force separation between modules that are only made available via the use of another module. For instance, in the following module definition:

module A {
  module B { ... }
}

It wouldn't be possible to write use A, B; if the use of each module didn't occur at roughly the same time. Otherwise, you would be forced to write:

use A;
use B;

bradcray commented 5 years ago

If I'm understanding correctly, I think the point of Bryant's last comment is that types and params are resolved at compile-time (before any of the program has executed), so it could be argued that execution-time code should be able to access the symbols before they are defined in program order.

I think the reason these are treated similarly to variables today is more due to the syntactic similarities than a deep requirement to treat them this way. That said, I think that consistency is also attractive, otherwise, one could write spaghetti code like:

{
  type t2 = rank*t;
  param n = 10;
  type t4 = list(t2);
  type t = int;
  param rank = 3;
  type t3 = [1..n] t2;
}

making resolution less of a linear processing of the code and more of a topological sort. Could we allow it? I guess so. Would it result in better code? It seems unlikely.

I suppose we could require correct ordering between all the compile-time-evaluated statements but permit them to be interleaved arbitrarily with execution-time-evaluated statements, but I'm not sure that's a big win. It also causes problems for types that have an execution-time component to them, like:

const n = infile.readInt();
type vec = [1..n] real;

where vec would not be fully defined until execution time.

lydia-duncan commented 5 years ago

I suppose we could require correct ordering between all the compile-time-evaluated statements but permit them to be interleaved arbitrarily with execution-time-evaluated statements, but I'm not sure that's a big win.

I think that would run into similar or worse confusion than what Bryant is currently encountering, yeah.

BryantLam commented 5 years ago

Could we allow it? I guess so. Would it result in better code? It seems unlikely.

That's the crux of the issue I have with #12744 and #13041. As a defensive programmer, having use statements behave like proc rather than like param/type† in not requiring lexical precedence is going to cause headaches. ... And even if you do lexically precede the reference...

I suppose we could require correct ordering between all the compile-time-evaluated statements but permit them to be interleaved arbitrarily with execution-time-evaluated statements, but I'm not sure that's a big win

I had this idea too as a separate compiler warning, but I didn't propose it in #13041 because it seemed hard to implement and the reward was dubious because the cases I actually care about are when modules define variables that will be used/shadowed into the active scope with or without the user's knowledge. Admittedly, import statements will improve things significantly because users won't be as tempted to pollute their active scope with symbols they may or may not use, but it doesn't help in all cases which is what concerns me as another Rule to learn that admittedly isn't being held consistently even in this issue here.†

†I understand why use statements could behave like proc, but I don't agree with it.

I would like the behavior of use-in-scope to change to be less weird. But if not, my concession in #13041 is to create a warning to force the user to learn about this weird behavior. Hopefully, you can see why I perceive the behaviors around type/param to be hypocritical even though I do think that that behavior is the correct one that will ultimately result in better engineered software.

chapel-lang / chapel

Improve spec description of what being in scope means #10021