Closed mppf closed 3 months ago
TODO: see what test failures look like with Approach 1.
We discussed this briefly offline today and I expressed my 2c reaction to this which is I don't find the current state to be too confusing. In the steady state, you'd expect the author to know that GR
is generic because they had to concretize it in their init
. For new programmers or new programs, I could see this potentially being a head scratcher (especially generic over management types since that is fairly Chapel specific) but the cost/benefit here doesn't sway me away from how things are.
You have to get rid of this. If you want a polymorphic record you can do it like this:
record GR [t] { var field: t; }
record A {
var field: GR[int];
}
record B[t] {
var field: GR[t];
}
The [t] is a universal quantifier. It must be instantiated on all uses. It is an error not to do so in all cases. End of confusion. You cannot put type t;
inside the record, that's nonsense. That would actually be the syntax for an existential type (and in fact is in Ocaml). You can change the exact syntax, but you cannot put the quantifier inside the thing which is quantified.
In some cases, the quantifiers can be deduced and the [t]
omitted: this occurs with overload resolution. For example for a function
proc f[t] (x:t) ....
f(1) // t = int is deduced so the [int] is not required
The general algebraic principle here is that types MUST use the same form as logic. This is due to the Curry Howard Isomorphism. In first order type systems (corresponding to first order logic), you have universal quantification and it MUST be at the head. This is called head-normal form. If you have nested quantifiers that's second order polymorphism and way beyond the scope of the current Chapel design. Existentials are also possible but tricky: in constructive logic, you cannot just have an existential quantifier you have to actually produce a witness.
At least in this sense the C++ syntax is correct: template<typename T> ....
correctly announces the universally quantified variable at the head.
@skaller I'm pretty confident we're not going to do that and we happen to like the language we've created. Telling us that it's nonsense/rubbish/crap or that some other way is the only way is not going to make friends or influence people.
But responding more directly to "Why not use something like C++ syntax?" - what we have today in terms of something being generic based on the inside of the declaration is intuitive to many, especially to programmers coming from Python. And, it significantly lowers the syntactical & conceptual overhead of writing generic code. (Perhaps too much, but that is the subject of this issue).
I understand and am aware that it is not the syntax that PL theory would suggest.
I don't buy the easier to learn argument at all. I have 30 years experience designing programming languages. It took me a week to read the Chapel documentation. By contrast, I learned go-lang from a single rather long tutorial video.
Furthermore, not even devotees of the language understand it, including Brad. Of course, C++ is even more complex, and I doubt anyone actually understands all of it.
Syntax is not arbitrary. There are rules. It has to reflect the underlying algebra or it will be hard to understand what your code does. The algebra has to be coherent (without special cases or conflicts) or it will be impossible to compose components. And the type system must allow you to type check every definition and every use thereof. Which you cannot do at the moment.
So you may like what you have but then why are these issues arising? It doesn't matter what you like. What matters is that you have a coherent type system, which includes polymorphism, and you have syntax that reflects the structure of the type system, otherwise uses will not understand the typing rules, and developers will not be able to generalise.
You can argue that some of the syntax is OK, and some could use some work, and similarly many other issues. It's always a challenge to find a relatively painless way to migrate. After a decade on the C++ committee, I have learned a bit about compatibility with older code (in fact in C++, with C). But I also learned that sometimes you have to break things, or you cannot make progress. And C++ is a great example of a complete failure. Compile times are horrible, errors incomprehensible, and all because templates, the mainstay of modern C++ library development, cannot be type checked.
I took me a while to learn Ocaml. ML is quite different to C++. The functional programming bit was no problem, but the function call syntax really threw me:
sin x
with no () around the x. So I surely do understand about the familiarity thing but in the end i got used to the Ocaml syntax, and now, the lack of unnecessary parens is a real blessing. And you can write sin(x)
if you want anyhow, it's still correct.
There's a difference between C++ and Chapel. C++ users are average programmers or unfortunate students. Chapel targets expert audiences. They may not be expert in PLs, but they all have solid mathematical backgrounds or they would not be interested in high performance distributed concurrent processing. Which means, your audience will accept breakages if it leads to better performance, and not just run time performance, but a more expressive language with better support for reusable components.
IMHO of course.
A note for anyone following this issue - I've edited the OP to add two more possible approaches to solve this problem.
Aside: Your second detailed example really bothers me, I don't understand why R
is not considered partially generic when evaluating computeTypeZ
(oh and computeTypeY
too, huh?). But maybe the point of that example is that it probably should and it might be a bug?
I appreciate the problem of reading record R { var x: D; }
and not knowing whether I should write var r: R;
or var r: R(D(int));
I would like to generalize this problem, however. Consider:
record R {
... hundreds of lines of code ...
type xyz;
... more code ...
type pqr;
... more code ...
}
Again, I will not know whether my concrete type should be R
, R(int)
, or R(int,real)
-- just because I am not seeing those type fields in the code. The answer is either (a) go read chpldoc or (b) react to compilation errors when I do not get it right the first time, with the compiler pointing me to type xyz
and type pqr
in the code. These two answers apply equally to the unknown genericity of a record's field.
Regarding complications arising with recursive types: one option is what we do for the return type of a recursive function. Namely, we require that it be specified.
Applying that here, have the compiler require the genericity of certain fields to be indicated (using (?) or some other syntax) in the cases where it cannot figure it out on its own.
This addresses my major concern with Approach 1 and similar that they present a maintenance burden. If the requirement applies only in certain corner cases, the maintenance burden is minor.
Applying that here, have the compiler require the genericity of certain fields to be indicated (using (?) or some other syntax) in the cases where it cannot figure it out on its own.
AFAIK the compiler can always figure it out. The question is just—if the compiler has to work hard to do so—will a user also have to work hard to do so?
Anyway I am currently expecting that we won't change the language design here but perhaps this issue will motivate us to be more careful about distinguishing generic vs concrete in error messages.
one option is what we do for the return type of a recursive function. Namely, we require that it be specified.
A bit off-topic, but the current behavior for recursive functions isn't very popular for users (nor for me, when I'm writing code from a user perspective). Specifically, there are plenty of recursive functions (maybe all?) where the compiler should be able to figure the return type out. As a simple example:
proc fact(n: int) {
if n == 0 then
return 1;
else
return n* fact(n-1);
}
I imagine that the compiler would start by looking at all the non-recursive return expressions and say "this seems to return int
" as with a non-recursive case. Then, it can plug that guess into the recursive cases to see whether it unifies properly (e.g., n * int
is int
so the recursive branch also returns int
—great!). If not, it issues an error.
I've added an Approach 5 to the top post of this issue. My current preference is to use Approach 5 if we are willing to make any change in this area at all.
One drawback to Approach 5 is that it makes certain patterns less easy to express. That's obvious in a way, because you have to write fields like type eltType; var element: eltType;
instead of just var element;
. But, it also impacts the compiler-generated initializer.
record R { var element; }
-> compiler generated initializer is proc R.init(element)
(which accepts a value)record R { type elementType; var element: elementType; }
-> compiler generated initializer is proc R.init(type elementType, element: elementType)
; so if you want proc R.init(element)
, you have to write it yourself.However, I don't find this drawback to be particularly concerning.
Approach 5 seems like it's addressing a different problem than the other solutions, did you mean to put it on a different issue?
Approach 5 seems like it's addressing a different problem than the other solutions, did you mean to put it on a different issue?
I meant for it to be on this issue. In fact it is like Approach 2 but also deprecates generic fields like var x;
.
It solves the problem of "It's too hard for the compiler/programmer to see if a field is generic" because you can simply look at the fields to see if any are declared with type
or param
. No longer could you make a generic record using only var
or const
fields. If there are any type or param fields, it is generic. (If we went with the proposal in #21075, it would be "if there are any type
or param
fields that do not have initialization expressions, then the class/record is generic").
So if the compiler detects that the type you are using for a field is generic, will it force you to make a type field for it? E.g.
record Foo {
var x: MyGenericType;
}
will result in an error and the correct way to write it is:
record Foo {
type t = MyGenericType;
var x: t;
}
?
So if the compiler detects that the type you are using for a field is generic, will it force you to make a type field for it? E.g.
record Foo { var x: MyGenericType; }
will result in an error
Yes, that is right, so far
and the correct way to write it is:
record Foo { type t = MyGenericType; var x: t; }
That is not the correct way to write it. Today, the field type t = MyGenericType
makes t
be generic with a default of MyGenericType
rather than a constraint on t
(see also #21075 which discusses changing that). To make a constraint on t
today, you would write initializers that check that t
is a subtype of MyGenericType
. (I think it would be interesting to consider other ways of doing that in the context of custom type constructors but I don't think we need that to make the change here). For example:
record Foo {
type t;
var x: t;
proc init(type t) {
if !isSubtype(t, MyGenericType) then
compilerError("Foo must be instantiated with a subtype of MyGenericType");
}
}
So for clients of Foo to know that x
should be an instantiation of MyGenericType
, would we be relying more strongly on the documentation for x
? That seems like placing an additional burden on record authors in multiple ways:
I don't view it as an additional burden because a number of things don't really work intuitively if you don't use type/param fields and instead rely on things like var x;
-- as a result I recommend the explicit type/param fields style for anyone who listens.
One specific situation. For fields of class type we have observed several cases where a user was creating a generic record/class when they did not intend to & then they were confused by the resulting error messages (which generally don't point to the problematic field). By requiring that to be slightly more explicit, I think we are making it less confusing for people who run into that issue.
Regarding the "Generic field with a constraint" pattern, I suspect that is not used very much in practice, outside of our own tests of it, but that is something we can gather more information on.
they have to add additional documentation to the original field, when before it would have been generated for them
Just about the documentation, having the type
/ param
field rule would allow the documentation to automatically clearly indicate when a record/class is generic. In the current situation, the user is responsible for doing that. I do not think that simply showing var field: SomeType
will convey to a documentation reader that SomeType is or is not generic. So, it is my view that the type
/ param
rule would improve documentation because it will be clearer when records/classes are generic.
Edit: With an improved warning for Approach 2, I'm seeing 823 failures out of 13465 (where any time the warning is emitted leads to a test failure, since I haven't updated any tests yet).
Branch is https://github.com/mppf/chapel/tree/warn-field-with-generic-declared-type
A few members of the team recently expressed a preference/interest in improving compiler diagnostics over changing the syntax. Here is my rationale for this direction.
range
is generic:......
var r1: range;
var r2: range = 1..n by 2;
......
If my program produces a genericity-related error, the compiler will be able to point me to the generic type and explain why it is generic. This is the boldest claim of them all and could be contested.
Records/classes are analogous to functions in this regard. For example, if I call the a function and it compiles, I do not need to think about the genericity of its formals.
Can my program compile and still have a bug in this case? Sure! For example, I declare a formal of the type 'MyClass', expecting the formal to get a copy of the class pointer at the time of the call. Then I invoked it on a variable with an 'owned MyClass'. My formal now tracks concurrent changes to that variable when I assume it wouldn't. How much does this possibility affect our strategy for genericity-related annotations on functions?
I may want to know proactively whether a certain type is generic. Here, again, compiler support works without additional genericity annotations. I can query isGeneric(my type) or we can add a genericity counterpart to the compiler option --explain-call, ex., --explain-genericity
, to explain what made my type generic.
The case of a recursive type, i.e., figuring our whether it is generic, is surely tricky and deserves further discussion.
Chapel has historically supported both rapid prototyping and production modes. For example, it provides the ability to omit type annotations. OTOH requiring explicit annotations on nilable types is something that favors the production mode. This allows us to choose to cater to both modes by making genericity annotations optional, if we decide to introduce new ones, or to favor the production mode by making them required.
Perhaps the ultimate solution is a good blend of all the ingredients: annotations, compiler support, optionality?
Here is how I think about the approaches in the OP.
idea | a# | advantages | range/domain | w/ defaults |
---|---|---|---|---|
mark upon use | 1, 3 | directly where it matters | helps some | helps? |
mark the def | 4 | more concise | no help | no help |
disallow [...] fields | 2, 5 | addresses #16508 | no help | helps? |
In more detail, the rows are:
Annotate the type (if it is generic) where it is used (1,3)
(?)
, :?t
, a keyword ex. generic
(?)
annotation interferes with the nilability mark ?
; how does it work for type expressions? to indicate genericity over mem. management?Annotate the entity (if it is generic) where it is declared (4)
(?)
, ?
, a keyword ex. generic
Disallow untyped fields and fields with generic declared types (2,5)
We may choose to make the chosen solution optional for prototype modules. This reduces the burden of the added verbosity.
How does each of these approaches helps or complicates program evolution, esp. when a record/class/proc changes from concrete to generic or visa versa?
Continuing https://github.com/chapel-lang/chapel/issues/19120#issuecomment-1379485723 --
Edit: With an improved warning for Approach 2, I'm seeing 823 failures out of 13465 (where any time the warning is emitted leads to a test failure, since I haven't updated any tests yet).
Branch is https://github.com/mppf/chapel/tree/warn-field-with-generic-declared-type
I've updated the branch to do two things:
?
argument like list(?)
With those, now I'm only seeing 31 failures in a testing run.
The second of these is a bit of a cheat, but warnings for internal/standard modules are repeated across many tests, so it's reasonable to hide them to understand the amount of code that would be affected by the warning. (Additionally, we can adjust the internal/standard module code to adhere to whatever style we think is clearest).
(Note, this branch is effectively doing a combination of Approach 1 and Approach 2 from this issue, with a warning instead of an error).
I've created a draft PR with my warning so that we can look at the codes that changed and where the warning occurred to discuss further. The PR is #21404.
Here is a cute and/or terrifying example where whether or not a record is generic arguably depends on the point-of-instantiation.
module Library {
// is Wrapper generic or concrete?
record Wrapper {
var x: fromPoi();
}
}
module Application {
use Library;
proc fromPoi() type {
return int; // ... but what if it returned integral ?
}
proc main() {
var x = new Wrapper();
}
}
See also the related issue #21426.
Edit -- in my opinion, it would be really bad for a record being generic or not is different in different parts of the code depending on point-of-instantiation details.
Here is a summary of a group meeting on 1/24.
Legend: (+) an item that is resolved, in my eyes (?) a question to be resolved
We did not discuss Approaches 1-5 aka A1-A5 directly. We discuss the following proposals:
(+) Approach B = A0 above, i.e., "do nothing", plus compiler producing better error messages with suggestions. This is a go, in addition to other actions below.
Approach A6: require ?
on partial instantiations
(+) yes, we want this: lots of votes in favor
(?) in all contexts: I assume yes
(?) warning or error: voting preference for making it an error
Approach A7: warn if no ?
on fields of declared generic types
(?) yes, we want this: voting preference in favor
(+) warning or error: voting slightly against error
(?) warn when the type is integral
, class
, owned
etc.
(?) warn in other contexts: variable declarations, formals, type aliases, etc.
(+) Approach C: the declaration record R { type elementType = int; }
makes R.elementType an alias for int
. This is a no-go.
Approach D: warn if no ()
on a generic type with all defaults
(?) do we want this? voting preference in favor
(?) what user problem(s) does this address? we would not want to have users write range()
instead of range
(+) allow R() without warning when R is concrete: voting preference against it
When the generic type is due to mem-mgmt or is given by a function: (+) A6 - applies the same way (+) A7 - yes, issue the warning (?) A7 - leaning against allowing (?) to suppress the warning (+) D does not apply
(?) We need to consider impact on range+domain genericity
My post-meeting reflections:
Having a uniform design appeals to me, ex. "anything generic has ?
". Even if the cost is verbosity as in "integral(?)", "class(?)", etc. A uniform design may help with range+domain genericity.
If we require (?) when using a generic record/class, we should also require (?) when declaring it. Rationale: for easier reference. (?) at decl has negligible impact on verbosity if we already require (?) at use.
I opened #21477 to bring all the above pieces together.
As an alternative to the syntax like generic record R {...}
, we could require an explicit type constructor to be the first thing inside a generic record or class. This is one of the clearest ways to indicate genericity together with informing about the parameters of genericity.
A couple of questions on this:
record R { param p; ........ }
Similarly to #18214, I think we're not going to do anything else with this and feel as though the language is in a better state than it was when the issue was opened. Is it okay to close this issue?
I do think we have some longer-term ideas for user-defined type constructors (#21992) and additionally some of the (?)
decorations might become mandatory in some cases in the future (and AFAIK missing (?)
generates unstable warnings today).
In any case, the immediate problem posed by this issue is now addressed through warnings, so I agree it can be closed, and I will close it now.
Related to #18665 and #16508.
Also related to a common error in declaring recursive class types (in details)
The question posed by this issue is - should it be clearer when a class/record is generic?
Today we have a problem that whether or not a class/record is generic depends upon how the field types are resolved. I don't think this makes sense.
In https://chapel-lang.org/docs/main/language/spec/generics.html#generic-classes-and-records , the spec says a generic field is:
Notably absent from this is "a var or const field that is declared with a generic type" even though it is how the compiler is behaving today. The fact that it behaves this way is due to my PR #9489 where I thought I was increasing generality of a feature in an obvious manner. However now I think that was not a good idea.
In particular, for the compiler today, the record
A
is generic in the below:However one has to look at / study the definition of
GR
in order to know thatA
is generic. It can be worse than that, because we allow type functions, the type offield
could be computed by a type function. So, in other words, it is not clear from therecord A
declaration itself whether the programmer intended to create a generic record.I think it is nice that writing generic code in Chapel is easy and Python-like. However, for a type declarations specifically, it makes a big difference to how the type is used whether or not it is generic. In particular, the generic fields form part of the API for the type since they are used in type construction (e.g.
A(int)
is a type construction call). So, I think that:Beyond the question about whether the current situation is good enough at communicating programmer intent, quite a bit of the compiler code behaves differently depending on whether a type is generic or not, and it would definitely make the compiler easier to implement if this property was obvious from the syntax.
Speaking of compiler bugs, here is a more difficult example. As long as the generic-ness of a type depends on the field types, it can only be computed during resolution, but that presents some challenge when the type is recursive. This example has surprising behavior today, in that the type seems to be both concrete (according to warnings) and generic (according to a later compilation error). Is the challenge in implementing the behavior correctly an indication that programs using this feature can be difficult to understand?
What could we do about this?
Approach 0
Don't change the compiler/language, and maybe ask people not to write code where this is confusing. Note that if we current approach, we need to update the language specification in https://chapel-lang.org/docs/language/spec/generics.html#generic-classes-and-records , which does not consider a field like
var x: integral
to make a class/record generic.Approach 1
Building on the proposal in #18665, a user wishing to make a field generic can do so with the
?
syntax.So in the original example, we could write
If we wanted to, we could go further and also require one write
proc f(arg: GR(?))
instead ofproc f(arg: GR)
. Which would help to make it clearer from function declarations themselves whether or not they are generic functions. This is discussed in issue #19121.Note that since, for a concrete
class C
, just writingC
is a generic type, we would have to decide what happens in a case like this:which is declaring a record with generic management, today.
We could require one write
for a concrete class
C
. But, ifC
is generic-with-defaults, that would change the meaning ofrecord B
because now it can accept non-default instantiations ofC
as well as any management. Perhaps(C)(?)
could mean to keep the default instantiation ofC
but allow any management.Approach 2
We could undo the change in PR #9489 and move instead to requiring no type specification (e.g.
var field;
) or separatetype
fields, which is the situation described in the spec and the situation before that PR.Approach 3
We could require
?t
syntax to make it clearer in this case. To combine that with saying the type is a subtype, we would use a where clause.Approach 4
We could require some sort of decoration on all generic records, e.g.
or even switching to a C++ style of separately identifying the type/param variables
IMO this is too big of a change but I am including it for completeness. In particular, I do not think Chapel users have trouble understanding
record R{ var x; }
orrecord R{ type t; }
orrecord R{ param p; }
are generic - but this proposal would change those. The 2nd form which is more like C++ would also change how type construction interacts with initializers (we could no longer compute the type to be constructed in theinit
method).Approach 5
We could require
type
orparam
variables in order for a record to be generic and deprecate fields likevar x;
orvar y: integral
. The advantage of this approach is that it makes generic handling more consistent and makes identifying a generic class/record easier. Additionally, this approach resolves an open design question about how to write a custom default-initializer for such types. For more information about this open design question, see:16508