dmacqueen / NewFrontEnd

2 stars 0 forks source link

variables-etc.txt bug report #1

Open YawarRaza7349 opened 1 day ago

YawarRaza7349 commented 1 day ago

variables bound in the local part of a local-in-end decl have the body decl as their scope

Variables bound in the local section are also available in the subsequent lines in the local section:

local
  val x = "strung"
  val y = x
in
  val () = print y
end

if a value variable is bound at top level in a structure, but not exported, it has Open scope (not the remaining sequence of decls in the structure)

It sounds like you want applied occurrences through a module identifier to refer exactly to the binding occurrence in a struct definition. I agree that it's desirable for scoping to not depend on module-level static semantics, but what about the following:

structure S = struct
  val f = fn x => x
  val g = f true
end :> sig
  val f : int -> int
  val g : bool
end

Do we want one variable, f, to have two different types, 'a. 'a -> 'a inside the struct and int -> int outside the struct?

Perhaps, and maybe there are problems with this approach too, but perhaps variables should only refer to identifiers that don't come after a period, and identifiers that do follow a period should instead be treated more like record labels, without the whole ceremony involving environments. After all, structure members don't require any sort of shadowing, do they? Otherwise, if you still wanted them to be variables, you would need to handle the signature ascription example above in some way.


Value variables and module variables will (probably) have an assoiciated unique _dynamic_ variable used to refer to the runtime value to which it is bound during execution.

Syntactic variables don't refer to a unique dynamic variable:

val bn = fn v => fn () => v
val th1 = bn 1
val th2 = bn 2

There's one syntactic variable, v, that refers to two dynamic variables, one assigned to 1 and one assigned to 2.


Note that if type variables have explicit binding points, no lexical distinction (like a leading apostrophy) is needed for type variables. We could have a convention that they be capitalized alphanumeric (and a corresponding convention that tycons be alphanumeric with an initial lower case letter).

If there is no lexical distinction between tycons and tyvars, then they should be in the same namespace as well. Consider the following hypothetical MsML (I'm using F#-esque angle brackets to avoid confusion in this specific example, not to suggest them for MsML):

datatype a = A
datatype b = B
datatype c<a> = CA of a | CB of b

You expect the a in CA to refer to the type variable, shadowing the datatype, while the b in CB instead refers to the datatype. It makes most sense to think of b and both as as all being in the same namespace, since you can't tell whether a variable in a type refers to a tycon or a tyvar just by syntactic categories alone, without looking at what's in scope.

dmacqueen commented 23 hours ago
  1. This is correct. One has to be a bit more precise about the scope of variables declared in the "local" part of a local-in-end declaration. One could say that such a variable has the usual scope relative to all the local declarations plus the "body" of the local-in-end declaration. The "usual scope" has to be defined carefully too, because the variable may be bound in a family or mutually recursive functions. And of course their could be shadowing; for instance, where a variable named "x" is defined twice in the local declaration.

2.1. With regard to a variable declared in a structure having OPEN scope, this is just an initial suggestion and has to be thought through. Declarations making up the body of a structure should behave (with respect to scoping and ultimate "visibility") just like declarations anywhere. They "denote" an environment, and that environment is packaged up (and potentially given a name) by the fact that the declaration is the body of a structure expression (i.e., occurs in struct <decl> end). It is quite possible for declarations in a structure body to be shadowed, as in

structure S =
struct
  val x = 3
  val x = true
end

which results in a structure S that has one component, S.x, of type bool.

2.2. Signature matching can coerce the type of a polymorphic variable declared in the body of a matched structure to be an instance of its polymorphic type in the body. No anomaly here.

2.3 I don't understand your 3rd suggestion about "variables should only refer to identifiers that don't come after a period". An expression of the form S.x is not a variable occurrence, it is a selection from a structure, which is a different form of expression. The fact that S.x has been referred to as a "qualified identifier" has perhaps lead to some confusion.


  1. A function parameter variable like v in your example will have a single dynamic counterpart. This dynamic counterpart of v will get bound to different dynamic argument values on different calls (as usual). In your example, the dynamic counterpart of v will be bound to the value 1 in the first call and to the value 2 in the second call.

  1. Lexical distinction between classes of variables (e.g. tycons and tyvars) is kind of orthogonal to the question of whether they should be in different namespaces. One might argue that if two classes of variables (or "identifiers") are lexically distinguished (say by capitalization) then they ought to belong to different namespaces. I am open to considering this argument. In the other direction, it seems possible that one might be able to get away with having only one namespace.

To be pedantic, it is best to be careful to distinguish the idea of an "identifier" from that of a "variable". A record field label is an example of an identifier that is not a variable. Other identifiers, such as keywords, should be processed and eliminated during parsing and therefore are not a concern when dealing with the abstract, internal representation of programs (or perhaps keywords should not be considered to be identifiers!). There is also the notion of a symbol. For me, a symbol is an internal concrete representation of certain classes of identifiers, such as value variables, data constructors, tycons, type variables, etc.