Closed DanilaFe closed 1 year ago
The subteam for this issue is @mppf, myself, @dlongnecke-cray , @benharsh , @bradcray and @e-kayrakli. @vasslitvinov will not be participating, but posted the following message with his opinion:
We have had several interesting proposals like https://github.com/chapel-lang/chapel/issues/21431 . My reaction to them is “do we really need all that complexity?” By contrast, using the chpl_ prefix, as in chpl_hash, while not fancy or breakthrough in language design, is easy to explain and use and gets the job done.
Here is a link to the PR for IO Serializers which describes the currently-stabilized interface: https://github.com/chapel-lang/chapel/pull/22437
The parts that users would implement on their types are:
// writing, as with writeThis
proc MyType.serialize(writer: fileWriter(?), ref serializer : writer.serializerType) throws
// For reading into existing values, as with readThis
proc MyType.deserialize(reader: fileReader(?), ref deserializer: reader.deserializerType) throws
// For reading in types
proc MyType.init(reader: fileReader(?), ref deserializer: reader.deserializerType) throws
We also compiler-generate these methods today whenever possible.
I think the most relevant methods are serialize
and deserialize
. These would both tend to focus on working with fields, so if we were to add private field support later, then the Chapel.hash
and operator hash
approaches would be more difficult to work with, and might result in users writing their own methods anyways.
On the namespacing issue, I think it's plausible that we would consider supporting multiple interfaces that both want methods named "serialize" and "deserialize". For example, IOSerializable
and CommSerializable
.
Lastly I'll add that these methods are meant to be invokable by implementors of Serializers and Deserializers.
@bradcray Because you wanted code comparing Python dundermethods and interfaces.
// Approach 1: Python style dunder-methods...
record rec {
var x: int;
}
// Define '__hash__' for 'rec'.
proc rec.__hash__(): uint(64) { return hash(x); }
// User defines their own hash method!
proc rec.hash() { return 8; }
proc main() {
var r = new rec();
var hx1 = r.__hash__(); // Call directly.
var hx2 = r.hash(); // This is the user 'hash' method, unrelated...
assert(hx1 != hx2);
}
// Approach 2a: Interfaces...
record rec {
var x: int;
}
// This interface lives in ChapelBase or an auto-module...
interface Hashable {
proc hash(): uint(64);
}
// This implements lives in ChapelBase too...
int(64) implements Hashable {
proc hash() { return hash_impl(self); }
}
// This implements for 'rec' lives in our source code.
rec implements Hashable {
// Calls Hashable<int(64)>.hash()...
proc hash() { return hash(x); }
}
// User defines their own hash method!
proc rec.hash() { return 8; }
proc main() {
var r = new rec();
// Compile-time reinterpret as hashable and call Hashable(rec).hash();
var hx1 = r.Hashable.hash(); // Syntax TBD, see #21343
// This calls the user's rec.hash() method.
var hx2 = r.hash();
assert(hx1 != hx2);
}
// Approach 2b: One option for interface "auto derivation".
// Consider the code in 2a, but adjust...
// This pragma indicates the compiler should automatically generate the
// Hashable interface. It will emit an error if it fails to do so, which
// would be if any field of 'rec' does not implement Hashable.
@autoderive("Hashable")
record rec {
var x: int; // The int types implement Hashable in ChapelBase.
var y: real; // Ditto...
}
// Default implementation of 'chpl_hashable' using generics + reflection.
proc chpl_hashable(const ref x) {
use Reflection;
var ret: uint(64) = 0;
for name in fields(x) do ret = chpl_hash_combine(ret, field(x, name));
return ret;
}
// COMPILER GENERATED!
rec implements Hashable {
proc hash(): uint(64) { return chpl_hashable(x); }
};
// Approach 2c: Another option for interface "auto derivation".
// This is just like '2b', but here 'rec' automatically derives the
// 'Hashable' interface. The user can turn that off by attaching...
// Do not auto-generate the Hashable interface...
@noautoderive("Hashable")
record rec {
var x: int;
}
// Or further, if the compiler detects a user-written...
rec implements Hashable { /** ... **/ }
// Then it will not attempt to auto-derive.
FWIW, I think Python dunder methods lend themselves very nicely towards transitioning to interfaces in the future. We could simply have the compiler do something like "if an interface Hashable
is explicitly found for type T
, then Hashable(T)
will be used. Otherwise, if a dunder method named __hash__
with a valid signature is found for T
then that method will be used. Otherwise, the Hashable
interface will be automatically derived for T
, unless auto-derivation is explicitly turned off via use of @noautoderive
.
@dlongnecke-cray - I don't think the proc hash
in ChapelBase is necessary for choosing between __hash__
and the interface idea; indeed either could work without it. I suspect some of us would prefer it not to exist (but I don't personally have a strong opinion on this point). Point is, I think it's something we can make an independent choice on, so it might be worth updating your comment to indicate it is optional in these proposals.
Here is a version of 1 and 2a with slightly different editorial choices, to be even more boiled down to the difference between the two:
// this is what the record / type author would write
record rec { }
// Define '__hash__' for 'rec'.
proc rec.__hash__(): uint(64) { ... }
// the standard library hashtable would call hash functions like this:
private proc hashSomeKey(x) {
return x.__hash__();
}
// this is what the record / type author would write
record rec { }
// indicating that 'rec' is Hashable and implementing the relevant 'hash':
rec implements Hashable {
proc hash() { ... }
}
// the standard library defines Hashable somewhere
interface Hashable {
proc hash(): uint(64);
}
// the standard library hashtable would call hash functions like this:
private proc hashSomeKey(x) {
return (x:Hashable).hash(); // exact syntax TBD; see issue #21343
}
I think it's both an advantage and a disadvantage of the Interfaces approach that implementing the method requires one to use two names: Hashable
and hash
:
Hashable
(which is why it helps with the special method issue but it's also nice because you can rename Hashable
on a use
if needed etc.) and it enables better checking (it's just as easy to implement two methods for an interface & the compiler knows to check for both). Hashable
as well as the name hash
.I find the "__hash__
would be shorthand for implementing an interface" idea intriguing. It led me to thinking of another thing.
Anyway, you know how we have this.super.init()
or even this.super.someMethod()
? Well the .super
is not really a field as much as it is a way to reinterpret a class. What if we had the same mechanism available to reinterpret something as an interface it implements? This has been proposed before on #21343.
There are key two points to this comment:
__hash__
could).Following along with the boiled-down examples from my previous comment, here is what it would look like.
// this is what the record / type author would write
record rec { }
// indicating that 'rec' is Hashable and implementing the relevant 'hash':
proc rec.Hashable.hash() { ... }
// Note that the compiler could check that such a 'hash' function
// meets the required signature whether or not we do that with
// the prototype constrained generics logic.
// the standard library defines Hashable somewhere
// For this proposal, the main point here is that the standard library defines
// the name Hashable. It could be handled directly in the compiler at first.
interface Hashable {
// Open Question: do we need it to have 'proc hash' at all at first?
// proc hash(): uint(64);
}
// the standard library hashtable would call hash functions like this:
private proc hashSomeKey(x) {
return x.Hashable.hash();
}
In terms of implementation, the compiler can just think of Hashable.hash
as a method name. It would translate it to something else, say, Hashable_hash
, by the time we get to C/LLVM IR. Likewise the call x.Hashable.hash()
would be translated (say, to x.Hashable_hash()
).
If we allowed rec.Hashable.hash
, that would be committing the ability to implement an interface for a type one method at a time, right?
Rather than say interface Hashable
can be empty, I think it would just be better to have the interfaces be hidden in the compiler rather than written out in module code. I think we still have to have a notion of proc hash(): uint(64)
stored somewhere so that the compiler can check against the signature of ImplementingType.hash
. We already do something similar for special methods today.
If we allowed
rec.Hashable.hash
, that would be committing the ability to implement an interface for a type one method at a time, right?
I would expect that if you try to implement any of those (proc myType.Hashable.anything()
, and the interface has multiple method requirements, then the compiler would check that all of the required methods are implemented by your type.
The only things that this would require us to commit to are:
Hashable
(hash
), ContextManager
(enter
, exit
), Serializable
((de)serializers)...proc Foo.Hashable.hash
(this is not to say that we could not add the block syntax later). For convenience we can restrict things so that all the interface methods have to be in the same scope.x.Hashable.hash
(or some other syntax in #21343).To implement this idea, we'd have to:
proc Foo.Hashable.hash
proc Foo.Hashable.hash
as belonging to a "proto-interface" and engage the default method signature checking rulesx.Hashable.hash()
to Foo.Hashable.hash
.Is there anything I'm missing? Because implementation-wise, this seems like an actually achievable lift. We don't even have to commit syntax for interface declarations if we just wave our hands and have all these proto interfaces stored in the compiler for now.
Following up to https://github.com/chapel-lang/chapel/issues/22618#issuecomment-1610080643 and @bradcray's request for an example comparing dunder vs interface approaches for the I/O methods.
Here is a comparison.
// this is what the record / type author would write
record rec { ... }
proc rec.__serialize__(writer: fileWriter(?), ref serializer : writer.serializerType) throws { ... }
proc rec.__deserialize__(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
// I'm not so sure about this one...
proc rec.__init__(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
// this is in the standard library somewhere
...
someRecord.__serialize__(writer, serializer);
...
someRecord.__deserialize__(reader, deserializer);
...
// I'm not so sure about this one...
var x = new someRecordType.__init__(reader=reader, deserializer=deserializer)
}
// this is what the record / type author would write
record rec { ... }
rec implements Serializable {
proc rec.serialize(writer: fileWriter(?), ref serializer : writer.serializerType) throws { ... }
}
rec implements Deserializable {
proc rec.deserialize(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
}
rec implements DeserializeInitializable {
proc rec.init(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
}
Open questions:
rec
in the above necessary? Perhaps we will want a shorter way to write this. Or use Self
or something. Nonetheless I'm fairly confident we could stabilize the form above (or something like it), at least for these interfaces.In the near term, the standard library would call these like this:
// this is in the standard library somewhere, in the near term
// (in the long term, these would be unnecessary, because they
// can be invoked from constrained generic functions in the natural way)
...
(someRecord:Serializable).serialize(writer, serializer); // see #21343 for options here
...
(someRecord:Deserializable).deserialize(reader, deserializer); // see #21343 for options here
...
// I'm not so sure about this one...
var x = new (someRecordType:DeserializeInitializable)(reader=reader, deserializer=deserializer)
In the long term, it would use constrained generics to do it, which would look like this:
proc doSerialize(arg: Serializable, writer: fileWriter(?), ref serializer : writer.serializerType) throws {
arg.serialize(writer, serializer);
}
// or with this alternative way of writing a constrained generic:
proc doSerialize(arg, writer: fileWriter(?), ref serializer : writer.serializerType) throws where arg implements Serializable {
arg.serialize(writer, serializer);
}
The others are similar with constrained generics:
proc doDeserialize(arg: Deserializable, reader: fileReader(?), ref deserializer: reader.deserializerType) throws {
arg.deserialize(reader, deserializer);
}
proc doDeserializeInitialize(type t: DeserializeInitializable, reader: fileReader(?), ref deserializer: reader.deserializerType) throws {
return new t(reader=reader, deserializer=deserializer);
}
It is interesting to note that invoking the special initializer isn't smooth sailing with either proposal. But, at least with the initializers approach, the strategy of writing a constrained generic function to call that initializer will work smoothly once we are ready to lean on constrained generics.
Also, I think we might be able to say that invoking these special methods as completely unstable for now:
manage bla { }
blockhash
yet (since the main purpose of it is to support the builtin hashtables in Map and associative domains)Nonetheless I think it's important to keep in mind how they might be invoked in terms of long-term design direction.
In terms of fundamental differences between the two (ignoring naming and syntactical choices that can vary within each proposal), I think there are two things:
Hashable
and hash
) where the dunder approach uses one (__hash__
)In an off-issue discussion, we are thinking:
So that leads me towards thinking, can we arrive at the simplest / most likely to be satisfying in the long term way to write that a particular type implements an interface?
I can think of two candidates, both based upon https://github.com/chapel-lang/chapel/blob/main/doc/rst/developer/chips/2.rst#implements-statements:
A. In the near term, I think it would be acceptable to require the interface implemented be described at the record declaration:
record rec implements Hashable {
proc hash(): uint(64) { ... }
}
However, this form does not currently parse.
B. Use a separate implements statement:
record rec {
proc hash(): uint(64) { ... }
}
rec implements Hashable;
This has the advantage of being implemented today.
For the methods that are compiler-generated by default (hash
, serialize
, deserialize
, and the deserialize initializer), we will need a way to opt out of generating these. For that I propose we have an empty interface Unhashable
e.g. record rec implements Unhashable
means that the record should not get a compiler-generated hash
function. We can also insist (for now) that if a proc hash
is present, that implements Hashable
is also present.
Note that the details of how to hash the type must be available looking only at the module defining the type. (I.e. we can't have a tertiary method proc hash
otherwise bad things can happen). IMO requiring at the type declaration point makes sense in the near term.
I would propose that we use attributes to indicate that a type should not generate a built-in interface. I have been calling the attribute @noautoderive
. I do not think this would be a big lift, as Ahmad has already done a ton of work for attributes. Also it avoids us having to commit to "negative interface" names. To not auto-generate Hashable
you would just write @noautoderive("Hashable")
.
[Edit: From reading the meeting minutes I can see that Michael has already emphasized everything I'm about to say as being important to the namespacing, but what's not clear to me is why we've decided to abandon that aspect of the proposal.]
I am worried that this approach does not solve conflicts in the event that a user wants to write both Hashable.hash
and their own hash:
record rec implements Hashable {
proc hash(): uint(64);
proc hash(): uint(64); // I can't have my own hash I'm using elsewhere?!
}
It seems like the user's choice is to opt in and lose use of the name hash
for other purposes, or opt out and not get Hashable
.
Why not require:
record rec implements Hashable {
proc Hashable.hash(): uint(64) { return 0; }
proc hash(): uint(64) { return 8; }
}
Instead? This would avoid any name conflict. We don't even have to change anything in the parser to be able to write Hashable.hash()
as a primary method.
As an argument that the semantics are roughly consistent with implements blocks, if we restrict things so that proc Hashable.hash()
can only be defined in the record's primary scope, then how is the above any different than:
record rec implements Hashable { ... }
rec implements Hashable {
proc hash(): uint(64) { return 0; }
}
In terms of functionality?
record rec {}
rec implements Hashable;
I do not feel like this syntax is appropriate to use. I'm pretty sure it was added with the intention of auto-implementing interfaces by examining surrounding primary/secondary methods. While that might be a nice feature to explore in the future, I don't think it solves the namespacing issue.
what's not clear to me is why we've decided to abandon that aspect of the proposal
we have not, please reach out to me off-issue.
Also, I think we might be able to say that invoking these special methods as completely unstable for now:
I think if we go down this path it means that there won't be a stable way to implement Serializers and Deserializers. Just wanted to point that out in case it wasn't clear.
Users can still use them in a stable way, and implement the relevant serialize/deserialize methods, but adding new formats wouldn't be stable.
I think if we go down this path it means that there won't be a stable way to implement Serializers and Deserializers. Just wanted to point that out in case it wasn't clear.
David pointed this out in the Slack thread, but I will type out the sentiment here: in the near term, the users of the Serializer / Deserializer type will not be affected if we don't have a specific way to call a method from an interface. The reason for this is that the special interface call syntax is only necessary if disambiguation is needed between rec.writeThis
and rec.Serializable.writeThis
; however, since writeThis
has previously been considered a special method (and since we're not introducing a way to define a separate interface method of the same name as a regular method), users won't have code in which the ambiguity is possible. Therefore, they'd be able to invoke the Serializer/Deserializer methods on the type directly; the implements Serializable
etc. would only serve to allow the standard library to treat the methods specially.
Here's @dlongnecke-cray's message verbatim, in case I paraphrased incorrectly.
I think the current proposal should handle that because you would just invoke the special methods as you would any other method.
There’s two poles: on the left you have the “interfaces are auto-fulfilled by looking at primary/secondary methods”, which is great for convenience but doesn’t give us the namespace shielding we need. On the right you have “interfaces are explicitly fulfilled within a namespace somewhere (e.g., a implements block or proc Hashable.hash(). We will need the latter to have namespace shielding. However we’re not ready to make the jump by RC-1.
So in this release we won’t have a way to explicitly invoke an interface method, but we also don’t need it until we add the ability to explicitly implement an interface. That’s because interfaces for now are just “auto-fulfilled/auto-implemented” by a user’s primary methods. We’re effectively grandfathering in the special methods, but only for a single release candidate. By RC-2 we’ll have had more than enough time to deliberate on what syntax/semantics we need to get us the namespace shielding we need, and when we add explicit fulfillment we’ll also add the explicit invocation at the same time.
We also avoid the possibility of having collisions by requiring that you must implement the interface if you have a matching method signature (just for RC-1).
On Thursday, we are going to continue our discussion of the best path forward with respect to special method naming -- including discussing the proposed interface-based approach. If we're all in agreement as to the approach, there are still a few open questions about how to proceed.
One such question concerns the names of the interfaces, as well as their methods. To help with such decisions, below are tables consisting of the names of interfaces similar to ours in other languages.
Notably, do we still want to keep enterThis
and exitThis
as including the word "this"? This may have been a strategy of dealing with special method naming, which would be made obsolete by any decision coming out of this subteam. Thus -- what do we think of enter
and exit
? Something else?
Proposd interface name: Hashable
Language | Hashing Method | Hashing Interface |
---|---|---|
Python | __hash__ |
Hashable |
Rust | hash |
Hash |
Swift | hash |
Hashable |
Java | N/A | N/A |
C# | N/A | N/A |
Proposed interface name: ContextManager
Language | Enter Method | Exit Method | Context Manager Interface |
---|---|---|---|
Python | __enter__ |
__exit__ |
ContextManager |
Rust | N/A | N/A | N/A |
Swift | N/A | N/A | N/A |
Java | close |
N/A | AutoClose |
C# | Dispose |
N/A | IDisposable |
Note: the closest thing to context managers in C#/Java are "try-with-resources", which I document in this table.
Note that unlike the other languages used for reference, Chapel actually has need for two deserialization interfaces: one for deserializing into an existing object (e.g. created via default initialization), and one for deserializing into a new object (for cases where the object cannot be default-initialized, for example).
Proposed interface names: Serializable
, Deserializable
, DeserializeInitializable
.
Language | Serialize Method | Deserialize Method | Serialization Interface | Deserializtion Interface |
---|---|---|---|---|
Python | N/A | N/A | N/A | N/A |
Rust | serialize |
deserialize |
Serialize |
Deserialize |
Swift | encode |
init(Decoder) |
Encodable 🐟 |
Decodable 🐟 |
Java | writeObject |
readObject |
Serializable |
Serializable (combined) |
C# | N/A | N/A | ISerializable |
ISerializable (combined) |
Another question we might want to discuss within this group (even if we don't officially come to a decision on it on behalf of the Chapel team at large), is the syntax we'd want for marking that a type implements an interface. This aspect is crucial to the efficacy of the interface-based approach: we want users to explicitly opt-in to the methods' specialness, so that user methods that happen to be named after a special method don't end up being used by the language.
For the time being, I propose that we do not consider how one can define methods in an interface's namespace only (addressed, for instance, by Michael's suggestion here). Thus, let us only consider ways of marking a type as one that opts in to special methods.
There are three major candidates we have so far. These are the three:
record implements Interface
This one mirrors Java approach of using the implements
keyword when defining the type.
record rec implements Hashable {
proc hash() {
// ...
}
}
record implements Interface {
This one mirrors C# and Swift's approaches of using :
, and also seems in the spirit of what we currently do with class inheritance. One downside might be that implementing an interface and extending a class are quite different, and we don't want to create confusion; we might also want to work out how this approach would work for a class that inherits from a parent and implements an interface.
record rec : Hashable {
proc hash() {
// ...
}
}
record implements Interface;
This approach uses the existing standalone implements
statements that I believe are part of the interfaces design right now. No other language I've found has a similar feature; we would be doing something new.
record rec {
proc hash() {
// ...
}
}
rec implements Hashable;
While this is probably a bigger discussion than the ad hoc team set out to have, Among these record rec: Hashable
is the most appealing to me.
One downside might be that implementing an interface and extending a class are quite different
I can probably be on the other side of this argument. I think they are similar enough that the language should look similar when extending/implementing. So, I see similarity as a plus here. An obvious note, but this is also symmetrical to proc foo(x: Hashable)
If we want to avoid using the same syntax as extending, one alternative I can think of is preceding class
/record
with the interfaces it implements:
Hashable class MyClass: BaseClass {
}
reads nicely. It may be cluttered if there are too many interfaces that MyClass
implements. But probably it can be stylistically addressed like:
Hashable, Serializable, Palatable, Personable
class MyClass: BaseClass {
}
I still find putting everything after :
to be the better alternative, though.
Proposed interface names: Serializable, Deserializable, DeserializeInitializable
Sheepish to ask, but: the implication here is that we want to support a type having serialize
but no deserialize
, is that right? My reflex is certainly based on "serialization" as a data movement concept, but I am a little afraid of seeing this triple way too often together for us to wish for a combined interface.
Another question we might want to discuss within this group (even if we don't officially come to a decision on it on behalf of the Chapel team at large), is the syntax we'd want for marking that a type implements an interface.
Note that I've created issue #22652 specifically to focus on this question.
we might also want to work out how this approach would work for a class that inherits from a parent and implements an interface.
That is a concern for both of the record-declaration forms and I showed an example and proposal for each in #22652.
Proposed interface names: Serializable, Deserializable, DeserializeInitializable
Sheepish to ask, but: the implication here is that we want to support a type having
serialize
but nodeserialize
, is that right? My reflex is certainly based on "serialization" as a data movement concept, but I am a little afraid of seeing this triple way too often together for us to wish for a combined interface.
Yes, but we can also add interfaces for the combination of these. For example, Swift has Codable
(🐟 !) to mean the combination of Encodable and Decodable.
@benharsh and I did some brainstorming on interface names here and we liked:
proc init
, InitDeserializable
proc deserialize
, UpdateDeserializable
/ RefDeserializable
/ MutableDeserializable
/ MutatingDeserializable
Deserializable
fro the combination of InitDeserializable
and UpdateDeserializable
I think it's interesting that Java combines both reading and writing into Serializable. I think it's actually somewhat common for Chapel types to be Serializable but not Deserializable. Of course we could seek to convey such things in a different way (such as throwing an error) but I'd expect we will be better off if we can have the module code react to implementing the Serializable / Deserializable interface (or not).
I think it's super interesting that Rust interfaces don't seem to use the Bla...able
style. It would make the names a bit less of a mouthful if we followed Rust in this regard. (Edit: apparently Rust followed Haskell in this regard).
Regardless of the route we take, we want to continue allowing standalone rec implements Interface;
declarations.
For example, when using a library type that is not hashable as-is, I may have a need to hash it and a way to define the hash method for it. We want to allow this using tertiary methods and tertiary implements
declarations.
The design subteam has reached consensus on large portions of this topic, though we explicitly leave some things for further discussion.
The main decision is that we will use interfaces to reserve "special" methods. Concretely:
hash
will become a part of a Hashable
interface.serialize
, deserialize
, and init-deserialization will be distributed among four interfaces, whose names are not yet decided:
WriteSerializable
interface for serialize
ReadSerializable
interface for deserialize
InitSerializable
interface for init
Serializable
interface that subsumes the above three for convenience.enter
and leave
(roughly) will become part of a ContextManager
interface.
-This
should be removed from enterThis
and leaveThis
because that naming scheme was used to solve the special naming issue as well. However, there is an open question about whether it should be leave
or exit
. -able
but are not required to.Users will need to opt in to the specially-named methods being treated in a special way by having their type implement the respective interface (e.g. a record would need to explicitly implement Hash
for that to be automatically used by the standard library).
We will allow the compiler to automatically generate implementations of certain methods and implement the related interfaces, so that things like writeln
-by-default for user-defined types continue to work. (The compiler would continue to automatically generate implementations for hash
, serialize
, deserialize
, the deprecated deserialize init
, and also readThis
and writeThis
. Note that readThis
and writeThis
are expected to be deprecated but aren't yet).
Transitionally, in 1.31 (and perhaps a few releases after that), the compiler will emit a warning for code defining a specially-named method that doesn't implement the respective interface.
We did not make all the necessary related decisions. Some of these are arguably out of scope here, and others we just ran out of time for. So, decisions still need to be made for:
enter
and leave
would have inertia based on what we have right now).proc hash
)proc hash
and implements Hashable
for a custom type when no competing proc hash
is present.Although we didn't decide, in our discussions, we are tending towards the following names / syntaxes:
record rec : Hashable { ... }
ReadSerializable
, WriteSerializable
, InitSerializable
, and Serializable
(votes from Daniel, David)ReadSerializable
, WriteSerializable
, InitSerializable
, and IoSerializable
(votes from Engin, Ben, but inconsistency concerns from Daniel)Because interfaces are a major implementation effort and language feature, our approach has been to minimize the aspects of interfaces that we want to stabilize for 2.0. Therefore, we will not be stabilizing:
proc hash
as well as a Hashable
proc hash
on the same type).proc foo(arg) where arg implements Fooable
)Closing this because the discussion itself has been settled.
Certain methods in Chapel have special meaning. Broadly speaking, there are methods such as
init
,this,
andthese
, which are language-ey and play a crucial role in Chapel. However, there are also other methods that aren't as fundamentally ingrained in Chapel, but we would like to have some special meaning:hash
,enterThis
, andexitThis
.The main issue is that we don't want to take method names like
hash
away from the user; furthermore, more generally, we don't want to make migration difficult for users if we are to introduce a new special method. Currently, if we just addhash
orenterThis
, users with methods with the same name would need to rename their methods and refactor their code.The main issue that covers this subject is https://github.com/chapel-lang/chapel/issues/19038. It also lists an additional concern about special methods. Quoting:
This issue is all about finding ways to reserve special methods like
hash
,enterThis
, andexitThis
, without making programming in Chapel and migrating to newer versions harder for the user, even if new special methods are added in the future.Approaches in Other Languages
__hash__
__
for compiler-specific thingsvalue_type
), adds operators (operator bool
). ConceptsHash
trait)hash
). Interfaces for other things.Hashable
)Add
), uses interfaces (ICollection
)Symbol.something
property (hard to translate to Chapel)The Contenders
Below is a brief list of all of the approaches suggested to address this issue.
Note that all of the contenders below solve the issue of preventing code from breaking when new special methods are introduced.
__hash__
convention. Currently, Chapel does not really reserve names starting and ending with double underscores. This approach requires no parser changes, but does take away names that users could've been using before.chpl_hash
. This shouldn't break user code becausechpl_
is normally restricted for internal Chapel things. On the other hand:chpl_
is normally restricted for internal Chapel things. So, it's weird for users to have to define a "private-looking" thing on their types.operator hash
. This approach would allowproc hash
to co-exist withoperator hash
; the operator version would not be directly callable. Instead, a Chapel procedure top-level non-methodproc hash(arg)
would be usable for invokingoperator hash
on any type that supports it.interface Hashable
could be defined (e.g.), and a procedurehash(x)
wherex
is Hashable. This would only work ifx
were hashable.writeln
etc. IfwriteThis
is implemented using an interface, we'll want there to be automatic instantiation of the interface.record Chapel {}
or something, and provide type methods likeproc Chapel.hash(x: MyType) {}
.Chapel.hash(x: MyClass)
.Chapel.hash(..)
, so we'd need to specifically importChapel
to bring in the user-provided implementations of the special functions, everywhere we'd want to use them.An incomplete list of possible special characters:
@
doesn't occur anywhere else, but might be a bit noisy.~
is bitwise negation::
is an option.$
is an option, but was previously used for sync/atomic variables.*
but could mean any / all#
but too commentyDo regular (not-special) functions allow the special characters?
Comparison Table
Note that all of the contenders below -- except the current
hash
-- solve the issue of preventing code from breaking when new special methods are introduced.In the following table, I give
hash
as an example method; however, this applies to any "special" methods that we would want to add, such asenterThis
andexitThis
,writeThis
, etc.Properties:
See the details block below for more information of what marks in each column represent.
hash
x.hash()
~hash~
,@hash@
--hash--
$hash$
x.~hash~()
-hash-
<hash>
:hash:
*hash*
#hash#
!hash!
x-hash-()
chpl_hash
x.chpl_hash()
generic access via
hash(x)
.chpl_
is typically internal Chapel.__hash
x.__hash()
__hash__
x.__hash__()
operator hash
operator bool
hash(x)
x.hash()
.(x:Hashable).hash()
Chapel.hash(x)
Column Details
### Parsing Approach Changes to the parser / lexer come in two flavors, which provide different answers to the following question: Should we specially reserve tokens for each function on a case-by-case basis, or just create a general rule? * __Broader Lexing Approach__: Create a "identifier*"-rule, no further need to reserve tokens or modify the parser. * Some approaches require this, like `x.hash()` => `x-hash-()` * Michael summarizes lexing approaches here: https://github.com/chapel-lang/chapel/issues/19050#issuecomment-1262245757 * __Incremental Lexing Approach__: Reserve `hash*` when needed, then `somethingElse*` when `somethingElse` is needed. * But then, stuff like `bla.somethingElse*(x+y)` will suddenly change behavior, unless... * You use different punctuation to make cases like the above unambiguous. ### Does not Reserve Identifier Some have expressed concern that using approaches like Python's dunder methods, such as `__hash__`, removes valid identifiers that the users could previously use for their code. If there are other ways to mitigate the problem of adding special methods that _don't_ take away any options, this is better, because users have more freedom in picking what they want their procedures are called. Thus, this column has a green check mark (✅) if no previously-valid identifiers are taken away, and a red "x" (❌) if they are. ### User Code Works As Is Some approaches presented here will require modifications to existing user code to make it work. For instance, `x-hash-(1+2)` could previously be the arithmetic expression `x - hash - (1+2)`, but under one of the approaches presented here would be interpeted as a method call. Thus, some approaches will require users to modify their code. This will only need to be done once, after which new special methods would be reserveable without breakage. Approaches that do not break user code at all get a green check mark (✅), while those that require changes are marked with a red "x" (❌). ### Makes Special Method Clear This column indicates whether a given approach makes it clear to the user that what they're invoking is a reserved / special method in Chapel, as opposed to just any other method. Approaches where the use of special methods is distinct from the user of "garden variety" methods are marked with a green check mark (✅), while those where calling a special method looks similar or identical to "usual" Chapel code are marked with a red "x" (❌). ### Precedent Language This column describes other languages that have solved the problem of reserving methods with special meaning in the same way as a particular approach. For instance, the `__hash__` approach is associated with Python, since Python uses dunder methods for "special" functionality. ### Callable Directly Another concern raised during design discussions is that of being able to call a special method directly. If the user writes a particular method on their data structure, it seems to make sense to make it possible for them to call that special method without jumping through any hoops. Approaches where the special method implementation can be called directly are marked with a green check mark (✅), while those where only the compiler can invoke the "special" method -- or where additional work is required to invoke it -- are marked with a red "x" (❌).[^1]: Only in certain contexts like constrained generics; other times might require a cast. [^2]: It would be accessed via a Chapel stdlib
hash
, so you'd know it's special; but it does look a lot like if user code provided a library function instead of a method. [^3]: User code probably shouldn't be using thechpl_
prefix; however, some projects might (Arkouda?).