chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

Demarcating "special methods" like `readThis`, `enterThis`, and `hash` #22618

Closed DanilaFe closed 1 year ago

DanilaFe commented 1 year ago

Certain methods in Chapel have special meaning. Broadly speaking, there are methods such as init, this, and these, which are language-ey and play a crucial role in Chapel. However, there are also other methods that aren't as fundamentally ingrained in Chapel, but we would like to have some special meaning: hash, enterThis, and exitThis.

The main issue is that we don't want to take method names like hash away from the user; furthermore, more generally, we don't want to make migration difficult for users if we are to introduce a new special method. Currently, if we just add hash or enterThis, users with methods with the same name would need to rename their methods and refactor their code.

The main issue that covers this subject is https://github.com/chapel-lang/chapel/issues/19038. It also lists an additional concern about special methods. Quoting:

How can I be alerted that I'm opting into a special method rather than happening to use a name that Chapel gives special meaning to?

This issue is all about finding ways to reserve special methods like hash, enterThis, and exitThis, without making programming in Chapel and migrating to newer versions harder for the user, even if new special methods are added in the future.

Approaches in Other Languages

Language Approach
Python "dunder" methods, like __hash__
C Reserves __ for compiler-specific things
C++ Reserves names (e.g. value_type), adds operators (operator bool). Concepts
Rust Traits (Hash trait)
Java All objects have some base methods (hash). Interfaces for other things.
Swift Interfaces / Traits with compiler-generated implementations (Hashable)
C# Just reserves names (e.g. Add), uses interfaces (ICollection)
JavaScript Prototype objects have a Symbol.something property (hard to translate to Chapel)

The Contenders

Below is a brief list of all of the approaches suggested to address this issue.

Note that all of the contenders below solve the issue of preventing code from breaking when new special methods are introduced.

An incomplete list of possible special characters:

Do regular (not-special) functions allow the special characters?

Comparison Table

Note that all of the contenders below -- except the current hash -- solve the issue of preventing code from breaking when new special methods are introduced.

In the following table, I give hash as an example method; however, this applies to any "special" methods that we would want to add, such as enterThis and exitThis, writeThis, etc.

Properties:

See the details block below for more information of what marks in each column represent.

Approach Parsing Approach Does not Reserve Identifiers User Code Works As Is Makes Special Method Clear Precedent Language Callable Directly? + How Ease of Use Notes
hash Works Already C#, C++ x.hash() Included for comparison
~hash~, @hash@ --hash-- $hash$ Either x.~hash~()
-hash- <hash> :hash: *hash* #hash# !hash! Broad x-hash-()
chpl_hash Works Already 🤔[^3] x.chpl_hash()
generic access via hash(x).
chpl_ is typically internal Chapel.
__hash Works Already PHP, C x.__hash()
__hash__ Works Already Python x.__hash__()
operator hash Either 🤔[^2] C++ operator bool ❌ accessed via hash(x) 🤔 Special methods aren't really operators.
interfaces Language Feature Rust, Java, Swift, Haskell 🤔[^1]x.hash(). (x:Hashable).hash() Requires proto-interfaces and the associated design
type methods Works Already Chapel.hash(x) 🤔 Might require special casing for class hierarchies
Column Details ### Parsing Approach Changes to the parser / lexer come in two flavors, which provide different answers to the following question: Should we specially reserve tokens for each function on a case-by-case basis, or just create a general rule? * __Broader Lexing Approach__: Create a "identifier*"-rule, no further need to reserve tokens or modify the parser. * Some approaches require this, like `x.hash()` => `x-hash-()` * Michael summarizes lexing approaches here: https://github.com/chapel-lang/chapel/issues/19050#issuecomment-1262245757 * __Incremental Lexing Approach__: Reserve `hash*` when needed, then `somethingElse*` when `somethingElse` is needed. * But then, stuff like `bla.somethingElse*(x+y)` will suddenly change behavior, unless... * You use different punctuation to make cases like the above unambiguous. ### Does not Reserve Identifier Some have expressed concern that using approaches like Python's dunder methods, such as `__hash__`, removes valid identifiers that the users could previously use for their code. If there are other ways to mitigate the problem of adding special methods that _don't_ take away any options, this is better, because users have more freedom in picking what they want their procedures are called. Thus, this column has a green check mark (✅) if no previously-valid identifiers are taken away, and a red "x" (❌) if they are. ### User Code Works As Is Some approaches presented here will require modifications to existing user code to make it work. For instance, `x-hash-(1+2)` could previously be the arithmetic expression `x - hash - (1+2)`, but under one of the approaches presented here would be interpeted as a method call. Thus, some approaches will require users to modify their code. This will only need to be done once, after which new special methods would be reserveable without breakage. Approaches that do not break user code at all get a green check mark (✅), while those that require changes are marked with a red "x" (❌). ### Makes Special Method Clear This column indicates whether a given approach makes it clear to the user that what they're invoking is a reserved / special method in Chapel, as opposed to just any other method. Approaches where the use of special methods is distinct from the user of "garden variety" methods are marked with a green check mark (✅), while those where calling a special method looks similar or identical to "usual" Chapel code are marked with a red "x" (❌). ### Precedent Language This column describes other languages that have solved the problem of reserving methods with special meaning in the same way as a particular approach. For instance, the `__hash__` approach is associated with Python, since Python uses dunder methods for "special" functionality. ### Callable Directly Another concern raised during design discussions is that of being able to call a special method directly. If the user writes a particular method on their data structure, it seems to make sense to make it possible for them to call that special method without jumping through any hoops. Approaches where the special method implementation can be called directly are marked with a green check mark (✅), while those where only the compiler can invoke the "special" method -- or where additional work is required to invoke it -- are marked with a red "x" (❌).

[^1]: Only in certain contexts like constrained generics; other times might require a cast. [^2]: It would be accessed via a Chapel stdlib hash, so you'd know it's special; but it does look a lot like if user code provided a library function instead of a method. [^3]: User code probably shouldn't be using the chpl_ prefix; however, some projects might (Arkouda?).

DanilaFe commented 1 year ago

The subteam for this issue is @mppf, myself, @dlongnecke-cray , @benharsh , @bradcray and @e-kayrakli. @vasslitvinov will not be participating, but posted the following message with his opinion:

We have had several interesting proposals like https://github.com/chapel-lang/chapel/issues/21431 . My reaction to them is “do we really need all that complexity?” By contrast, using the chpl_ prefix, as in chpl_hash, while not fancy or breakthrough in language design, is easy to explain and use and gets the job done.

benharsh commented 1 year ago

Here is a link to the PR for IO Serializers which describes the currently-stabilized interface: https://github.com/chapel-lang/chapel/pull/22437

The parts that users would implement on their types are:

// writing, as with writeThis
proc MyType.serialize(writer: fileWriter(?), ref serializer : writer.serializerType) throws

// For reading into existing values, as with readThis
proc MyType.deserialize(reader: fileReader(?), ref deserializer: reader.deserializerType) throws

// For reading in types
proc MyType.init(reader: fileReader(?), ref deserializer: reader.deserializerType) throws

We also compiler-generate these methods today whenever possible.

I think the most relevant methods are serialize and deserialize. These would both tend to focus on working with fields, so if we were to add private field support later, then the Chapel.hash and operator hash approaches would be more difficult to work with, and might result in users writing their own methods anyways.

On the namespacing issue, I think it's plausible that we would consider supporting multiple interfaces that both want methods named "serialize" and "deserialize". For example, IOSerializable and CommSerializable.

Lastly I'll add that these methods are meant to be invokable by implementors of Serializers and Deserializers.

dlongnecke-cray commented 1 year ago

@bradcray Because you wanted code comparing Python dundermethods and interfaces.

// Approach 1: Python style dunder-methods...

record rec {
  var x: int;
}

// Define '__hash__' for 'rec'.
proc rec.__hash__(): uint(64) { return hash(x); }

// User defines their own hash method!
proc rec.hash() { return 8; }

proc main() {
  var r = new rec();
  var hx1 = r.__hash__();   // Call directly.
  var hx2 = r.hash();       // This is the user 'hash' method, unrelated...
  assert(hx1 != hx2);
}
// Approach 2a: Interfaces...

record rec {
  var x: int;
}

// This interface lives in ChapelBase or an auto-module...
interface Hashable {
  proc hash(): uint(64);
}

// This implements lives in ChapelBase too...
int(64) implements Hashable {
  proc hash() { return hash_impl(self); }
}

// This implements for 'rec' lives in our source code.
rec implements Hashable {
  // Calls Hashable<int(64)>.hash()...
  proc hash() { return hash(x); }
}

// User defines their own hash method!
proc rec.hash() { return 8; }

proc main() {
  var r = new rec();

  // Compile-time reinterpret as hashable and call Hashable(rec).hash();
  var hx1 = r.Hashable.hash(); // Syntax TBD, see #21343

  // This calls the user's rec.hash() method.
  var hx2 = r.hash();

  assert(hx1 != hx2);
}
// Approach 2b: One option for interface "auto derivation".
// Consider the code in 2a, but adjust...

// This pragma indicates the compiler should automatically generate the
// Hashable interface. It will emit an error if it fails to do so, which
// would be if any field of 'rec' does not implement Hashable.
@autoderive("Hashable")
record rec {
  var x: int;       // The int types implement Hashable in ChapelBase.
  var y: real;      // Ditto...
}

// Default implementation of 'chpl_hashable' using generics + reflection.
proc chpl_hashable(const ref x) {
  use Reflection;
  var ret: uint(64) = 0;
  for name in fields(x) do ret = chpl_hash_combine(ret, field(x, name));
  return ret;
}

// COMPILER GENERATED!
rec implements Hashable {
  proc hash(): uint(64) { return chpl_hashable(x); }
};
// Approach 2c: Another option for interface "auto derivation".
// This is just like '2b', but here 'rec' automatically derives the
// 'Hashable' interface. The user can turn that off by attaching...

// Do not auto-generate the Hashable interface...
@noautoderive("Hashable")
record rec {
  var x: int;
}

// Or further, if the compiler detects a user-written...
rec implements Hashable { /** ... **/ }

// Then it will not attempt to auto-derive.

FWIW, I think Python dunder methods lend themselves very nicely towards transitioning to interfaces in the future. We could simply have the compiler do something like "if an interface Hashable is explicitly found for type T, then Hashable(T) will be used. Otherwise, if a dunder method named __hash__ with a valid signature is found for T then that method will be used. Otherwise, the Hashable interface will be automatically derived for T, unless auto-derivation is explicitly turned off via use of @noautoderive.

mppf commented 1 year ago

@dlongnecke-cray - I don't think the proc hash in ChapelBase is necessary for choosing between __hash__ and the interface idea; indeed either could work without it. I suspect some of us would prefer it not to exist (but I don't personally have a strong opinion on this point). Point is, I think it's something we can make an independent choice on, so it might be worth updating your comment to indicate it is optional in these proposals.

Here is a version of 1 and 2a with slightly different editorial choices, to be even more boiled down to the difference between the two:

Approach 1: Python style dunder-methods

// this is what the record / type author would write

record rec { }

// Define '__hash__' for 'rec'.
proc rec.__hash__(): uint(64) { ... }
// the standard library hashtable would call hash functions like this:
private proc hashSomeKey(x) {
  return x.__hash__();
}

Approach 2a: Interfaces

// this is what the record / type author would write

record rec { }

// indicating that 'rec' is Hashable and implementing the relevant 'hash':
rec implements Hashable {
  proc hash() { ... }
}
// the standard library defines Hashable somewhere
interface Hashable {
  proc hash(): uint(64);
}
// the standard library hashtable would call hash functions like this:
private proc hashSomeKey(x) {
  return (x:Hashable).hash(); // exact syntax TBD; see issue #21343
}

I think it's both an advantage and a disadvantage of the Interfaces approach that implementing the method requires one to use two names: Hashable and hash:

mppf commented 1 year ago

I find the "__hash__ would be shorthand for implementing an interface" idea intriguing. It led me to thinking of another thing.

Anyway, you know how we have this.super.init() or even this.super.someMethod() ? Well the .super is not really a field as much as it is a way to reinterpret a class. What if we had the same mechanism available to reinterpret something as an interface it implements? This has been proposed before on #21343.

There are key two points to this comment:

  1. Such a functionality could serve as a namespacing strategy for special methods even before we have interfaces built (just like __hash__ could).
  2. This idea could extend to a way to declare such methods.

Following along with the boiled-down examples from my previous comment, here is what it would look like.

Approach 2k: interface-y rec.Hashable

// this is what the record / type author would write

record rec { }

// indicating that 'rec' is Hashable and implementing the relevant 'hash':
proc rec.Hashable.hash() { ... }

// Note that the compiler could check that such a 'hash' function
// meets the required signature whether or not we do that with
// the prototype constrained generics logic.
// the standard library defines Hashable somewhere
// For this proposal, the main point here is that the standard library defines
// the name Hashable. It could be handled directly in the compiler at first.
interface Hashable {
  // Open Question: do we need it to have 'proc hash' at all at first?
  // proc hash(): uint(64);
}
// the standard library hashtable would call hash functions like this:
private proc hashSomeKey(x) {
  return x.Hashable.hash();
}

In terms of implementation, the compiler can just think of Hashable.hash as a method name. It would translate it to something else, say, Hashable_hash, by the time we get to C/LLVM IR. Likewise the call x.Hashable.hash() would be translated (say, to x.Hashable_hash()).

dlongnecke-cray commented 1 year ago

If we allowed rec.Hashable.hash, that would be committing the ability to implement an interface for a type one method at a time, right?

Rather than say interface Hashable can be empty, I think it would just be better to have the interfaces be hidden in the compiler rather than written out in module code. I think we still have to have a notion of proc hash(): uint(64) stored somewhere so that the compiler can check against the signature of ImplementingType.hash. We already do something similar for special methods today.

mppf commented 1 year ago

If we allowed rec.Hashable.hash, that would be committing the ability to implement an interface for a type one method at a time, right?

I would expect that if you try to implement any of those (proc myType.Hashable.anything(), and the interface has multiple method requirements, then the compiler would check that all of the required methods are implemented by your type.

dlongnecke-cray commented 1 year ago

The only things that this would require us to commit to are:

To implement this idea, we'd have to:

Is there anything I'm missing? Because implementation-wise, this seems like an actually achievable lift. We don't even have to commit syntax for interface declarations if we just wave our hands and have all these proto interfaces stored in the compiler for now.

mppf commented 1 year ago

Following up to https://github.com/chapel-lang/chapel/issues/22618#issuecomment-1610080643 and @bradcray's request for an example comparing dunder vs interface approaches for the I/O methods.

Here is a comparison.

dunder

// this is what the record / type author would write

record rec { ... }

proc rec.__serialize__(writer: fileWriter(?), ref serializer : writer.serializerType) throws { ... }

proc rec.__deserialize__(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }

// I'm not so sure about this one...
proc rec.__init__(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
// this is in the standard library somewhere
...
someRecord.__serialize__(writer, serializer);
...
someRecord.__deserialize__(reader, deserializer);
...
// I'm not so sure about this one...
var x = new someRecordType.__init__(reader=reader, deserializer=deserializer)
}

interfaces

// this is what the record / type author would write

record rec { ... }

rec implements Serializable {
  proc rec.serialize(writer: fileWriter(?), ref serializer : writer.serializerType) throws { ... }
}
rec implements Deserializable {
  proc rec.deserialize(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
}
rec implements DeserializeInitializable {
  proc rec.init(reader: fileReader(?), ref deserializer: reader.deserializerType) throws { ... }
}

Open questions:

In the near term, the standard library would call these like this:

// this is in the standard library somewhere, in the near term
// (in the long term, these would be unnecessary, because they
//  can be invoked from constrained generic functions in the natural way)
...
(someRecord:Serializable).serialize(writer, serializer); // see #21343 for options here
...
(someRecord:Deserializable).deserialize(reader, deserializer); // see #21343 for options here
...
// I'm not so sure about this one...
var x = new (someRecordType:DeserializeInitializable)(reader=reader, deserializer=deserializer)

In the long term, it would use constrained generics to do it, which would look like this:

proc doSerialize(arg: Serializable, writer: fileWriter(?), ref serializer : writer.serializerType) throws {
  arg.serialize(writer, serializer);
}
// or with this alternative way of writing a constrained generic:
proc doSerialize(arg, writer: fileWriter(?), ref serializer : writer.serializerType) throws where arg implements Serializable {
  arg.serialize(writer, serializer);
}

The others are similar with constrained generics:

proc doDeserialize(arg: Deserializable, reader: fileReader(?), ref deserializer: reader.deserializerType) throws {
  arg.deserialize(reader, deserializer);
}
proc doDeserializeInitialize(type t: DeserializeInitializable, reader: fileReader(?), ref deserializer: reader.deserializerType) throws {
  return new t(reader=reader, deserializer=deserializer);
}

Conclusion

It is interesting to note that invoking the special initializer isn't smooth sailing with either proposal. But, at least with the initializers approach, the strategy of writing a constrained generic function to call that initializer will work smoothly once we are ready to lean on constrained generics.

Also, I think we might be able to say that invoking these special methods as completely unstable for now:

Nonetheless I think it's important to keep in mind how they might be invoked in terms of long-term design direction.

In terms of fundamental differences between the two (ignoring naming and syntactical choices that can vary within each proposal), I think there are two things:

  1. The interface approach uses 2 names (Hashable and hash) where the dunder approach uses one (__hash__)
  2. The idea that we can have multiple methods with the same name implementing different interfaces does not seem to exist in many languages with interfaces/constrained generics. It exists in Rust, but not Swift, for example [1] [2]. So, if we were uncertain if it is reasonable for Chapel to have that feature, we might want to avoid using interfaces here, because it assumes we have that feature in order to solve the main problem.
mppf commented 1 year ago

In an off-issue discussion, we are thinking:

  1. We can leave the way to invoke the special method unstable for now & we should focus on how it can be defined.
  2. We can leave the issue of indicating a method call from a particular interface for later, once we want to add a new special method. (But using interfaces to declare them will help keep things consistent in the future). Edit: It is important to note that we can add this as a non-breaking change as long as we generate a compilation error any time such a duplicate definition appears.

So that leads me towards thinking, can we arrive at the simplest / most likely to be satisfying in the long term way to write that a particular type implements an interface?

I can think of two candidates, both based upon https://github.com/chapel-lang/chapel/blob/main/doc/rst/developer/chips/2.rst#implements-statements:

implements Option A

A. In the near term, I think it would be acceptable to require the interface implemented be described at the record declaration:

record rec implements Hashable {
  proc hash(): uint(64) { ... }
}

However, this form does not currently parse.

implements Option B

B. Use a separate implements statement:

record rec {
  proc hash(): uint(64) { ... }
}
rec implements Hashable;

This has the advantage of being implemented today.

other notes

For the methods that are compiler-generated by default (hash, serialize, deserialize, and the deserialize initializer), we will need a way to opt out of generating these. For that I propose we have an empty interface Unhashable e.g. record rec implements Unhashable means that the record should not get a compiler-generated hash function. We can also insist (for now) that if a proc hash is present, that implements Hashable is also present.

Note that the details of how to hash the type must be available looking only at the module defining the type. (I.e. we can't have a tertiary method proc hash otherwise bad things can happen). IMO requiring at the type declaration point makes sense in the near term.

dlongnecke-cray commented 1 year ago

I would propose that we use attributes to indicate that a type should not generate a built-in interface. I have been calling the attribute @noautoderive. I do not think this would be a big lift, as Ahmad has already done a ton of work for attributes. Also it avoids us having to commit to "negative interface" names. To not auto-generate Hashable you would just write @noautoderive("Hashable").

dlongnecke-cray commented 1 year ago

implements Option A

[Edit: From reading the meeting minutes I can see that Michael has already emphasized everything I'm about to say as being important to the namespacing, but what's not clear to me is why we've decided to abandon that aspect of the proposal.]

I am worried that this approach does not solve conflicts in the event that a user wants to write both Hashable.hash and their own hash:

record rec implements Hashable {
  proc hash(): uint(64);
  proc hash(): uint(64); // I can't have my own hash I'm using elsewhere?!
}

It seems like the user's choice is to opt in and lose use of the name hash for other purposes, or opt out and not get Hashable.

Why not require:

record rec implements Hashable {
  proc Hashable.hash(): uint(64) { return 0; }
  proc hash(): uint(64) { return 8; }
}

Instead? This would avoid any name conflict. We don't even have to change anything in the parser to be able to write Hashable.hash() as a primary method.

As an argument that the semantics are roughly consistent with implements blocks, if we restrict things so that proc Hashable.hash() can only be defined in the record's primary scope, then how is the above any different than:

record rec implements Hashable { ... }

rec implements Hashable {
  proc hash(): uint(64) { return 0; }
}

In terms of functionality?

implements Option B

record rec {}
rec implements Hashable;

I do not feel like this syntax is appropriate to use. I'm pretty sure it was added with the intention of auto-implementing interfaces by examining surrounding primary/secondary methods. While that might be a nice feature to explore in the future, I don't think it solves the namespacing issue.

mppf commented 1 year ago

what's not clear to me is why we've decided to abandon that aspect of the proposal

we have not, please reach out to me off-issue.

benharsh commented 1 year ago

Also, I think we might be able to say that invoking these special methods as completely unstable for now:

I think if we go down this path it means that there won't be a stable way to implement Serializers and Deserializers. Just wanted to point that out in case it wasn't clear.

Users can still use them in a stable way, and implement the relevant serialize/deserialize methods, but adding new formats wouldn't be stable.

DanilaFe commented 1 year ago

I think if we go down this path it means that there won't be a stable way to implement Serializers and Deserializers. Just wanted to point that out in case it wasn't clear.

David pointed this out in the Slack thread, but I will type out the sentiment here: in the near term, the users of the Serializer / Deserializer type will not be affected if we don't have a specific way to call a method from an interface. The reason for this is that the special interface call syntax is only necessary if disambiguation is needed between rec.writeThis and rec.Serializable.writeThis; however, since writeThis has previously been considered a special method (and since we're not introducing a way to define a separate interface method of the same name as a regular method), users won't have code in which the ambiguity is possible. Therefore, they'd be able to invoke the Serializer/Deserializer methods on the type directly; the implements Serializable etc. would only serve to allow the standard library to treat the methods specially.

Here's @dlongnecke-cray's message verbatim, in case I paraphrased incorrectly.

I think the current proposal should handle that because you would just invoke the special methods as you would any other method.

There’s two poles: on the left you have the “interfaces are auto-fulfilled by looking at primary/secondary methods”, which is great for convenience but doesn’t give us the namespace shielding we need. On the right you have “interfaces are explicitly fulfilled within a namespace somewhere (e.g., a implements block or proc Hashable.hash(). We will need the latter to have namespace shielding. However we’re not ready to make the jump by RC-1.

So in this release we won’t have a way to explicitly invoke an interface method, but we also don’t need it until we add the ability to explicitly implement an interface. That’s because interfaces for now are just “auto-fulfilled/auto-implemented” by a user’s primary methods. We’re effectively grandfathering in the special methods, but only for a single release candidate. By RC-2 we’ll have had more than enough time to deliberate on what syntax/semantics we need to get us the namespace shielding we need, and when we add explicit fulfillment we’ll also add the explicit invocation at the same time.

We also avoid the possibility of having collisions by requiring that you must implement the interface if you have a matching method signature (just for RC-1).

DanilaFe commented 1 year ago

On Thursday, we are going to continue our discussion of the best path forward with respect to special method naming -- including discussing the proposed interface-based approach. If we're all in agreement as to the approach, there are still a few open questions about how to proceed.

One such question concerns the names of the interfaces, as well as their methods. To help with such decisions, below are tables consisting of the names of interfaces similar to ours in other languages.

Notably, do we still want to keep enterThis and exitThis as including the word "this"? This may have been a strategy of dealing with special method naming, which would be made obsolete by any decision coming out of this subteam. Thus -- what do we think of enter and exit? Something else?

Hashing

Proposd interface name: Hashable

Language Hashing Method Hashing Interface
Python __hash__ Hashable
Rust hash Hash
Swift hash Hashable
Java N/A N/A
C# N/A N/A

Context Managers

Proposed interface name: ContextManager

Language Enter Method Exit Method Context Manager Interface
Python __enter__ __exit__ ContextManager
Rust N/A N/A N/A
Swift N/A N/A N/A
Java close N/A AutoClose
C# Dispose N/A IDisposable

Note: the closest thing to context managers in C#/Java are "try-with-resources", which I document in this table.

Serialization

Note that unlike the other languages used for reference, Chapel actually has need for two deserialization interfaces: one for deserializing into an existing object (e.g. created via default initialization), and one for deserializing into a new object (for cases where the object cannot be default-initialized, for example).

Proposed interface names: Serializable, Deserializable, DeserializeInitializable.

Language Serialize Method Deserialize Method Serialization Interface Deserializtion Interface
Python N/A N/A N/A N/A
Rust serialize deserialize Serialize Deserialize
Swift encode init(Decoder) Encodable 🐟 Decodable 🐟
Java writeObject readObject Serializable Serializable (combined)
C# N/A N/A ISerializable ISerializable (combined)
DanilaFe commented 1 year ago

Another question we might want to discuss within this group (even if we don't officially come to a decision on it on behalf of the Chapel team at large), is the syntax we'd want for marking that a type implements an interface. This aspect is crucial to the efficacy of the interface-based approach: we want users to explicitly opt-in to the methods' specialness, so that user methods that happen to be named after a special method don't end up being used by the language.

For the time being, I propose that we do not consider how one can define methods in an interface's namespace only (addressed, for instance, by Michael's suggestion here). Thus, let us only consider ways of marking a type as one that opts in to special methods.

There are three major candidates we have so far. These are the three:

Approach 1a: record implements Interface

This one mirrors Java approach of using the implements keyword when defining the type.

record rec implements Hashable {
  proc hash() {
    // ...
  }
}

Approach 1b: record implements Interface {

This one mirrors C# and Swift's approaches of using :, and also seems in the spirit of what we currently do with class inheritance. One downside might be that implementing an interface and extending a class are quite different, and we don't want to create confusion; we might also want to work out how this approach would work for a class that inherits from a parent and implements an interface.

record rec : Hashable {
  proc hash() {
    // ...
  }
}

Approach 2: record implements Interface;

This approach uses the existing standalone implements statements that I believe are part of the interfaces design right now. No other language I've found has a similar feature; we would be doing something new.

record rec {
  proc hash() {
    // ...
  }
}
rec implements Hashable;
e-kayrakli commented 1 year ago

While this is probably a bigger discussion than the ad hoc team set out to have, Among these record rec: Hashable is the most appealing to me.

One downside might be that implementing an interface and extending a class are quite different

I can probably be on the other side of this argument. I think they are similar enough that the language should look similar when extending/implementing. So, I see similarity as a plus here. An obvious note, but this is also symmetrical to proc foo(x: Hashable)

If we want to avoid using the same syntax as extending, one alternative I can think of is preceding class/record with the interfaces it implements:

Hashable class MyClass: BaseClass {

}

reads nicely. It may be cluttered if there are too many interfaces that MyClass implements. But probably it can be stylistically addressed like:

Hashable, Serializable, Palatable, Personable
class MyClass: BaseClass {

}

I still find putting everything after : to be the better alternative, though.


Proposed interface names: Serializable, Deserializable, DeserializeInitializable

Sheepish to ask, but: the implication here is that we want to support a type having serialize but no deserialize, is that right? My reflex is certainly based on "serialization" as a data movement concept, but I am a little afraid of seeing this triple way too often together for us to wish for a combined interface.

mppf commented 1 year ago

Another question we might want to discuss within this group (even if we don't officially come to a decision on it on behalf of the Chapel team at large), is the syntax we'd want for marking that a type implements an interface.

Note that I've created issue #22652 specifically to focus on this question.

we might also want to work out how this approach would work for a class that inherits from a parent and implements an interface.

That is a concern for both of the record-declaration forms and I showed an example and proposal for each in #22652.

Proposed interface names: Serializable, Deserializable, DeserializeInitializable

Sheepish to ask, but: the implication here is that we want to support a type having serialize but no deserialize, is that right? My reflex is certainly based on "serialization" as a data movement concept, but I am a little afraid of seeing this triple way too often together for us to wish for a combined interface.

Yes, but we can also add interfaces for the combination of these. For example, Swift has Codable (🐟 !) to mean the combination of Encodable and Decodable.

@benharsh and I did some brainstorming on interface names here and we liked:

I think it's interesting that Java combines both reading and writing into Serializable. I think it's actually somewhat common for Chapel types to be Serializable but not Deserializable. Of course we could seek to convey such things in a different way (such as throwing an error) but I'd expect we will be better off if we can have the module code react to implementing the Serializable / Deserializable interface (or not).

I think it's super interesting that Rust interfaces don't seem to use the Bla...able style. It would make the names a bit less of a mouthful if we followed Rust in this regard. (Edit: apparently Rust followed Haskell in this regard).

vasslitvinov commented 1 year ago

Regardless of the route we take, we want to continue allowing standalone rec implements Interface; declarations.

For example, when using a library type that is not hashable as-is, I may have a need to hash it and a way to define the hash method for it. We want to allow this using tertiary methods and tertiary implements declarations.

DanilaFe commented 1 year ago

The design subteam has reached consensus on large portions of this topic, though we explicitly leave some things for further discussion.

The main decision is that we will use interfaces to reserve "special" methods. Concretely:

Users will need to opt in to the specially-named methods being treated in a special way by having their type implement the respective interface (e.g. a record would need to explicitly implement Hash for that to be automatically used by the standard library).

We will allow the compiler to automatically generate implementations of certain methods and implement the related interfaces, so that things like writeln-by-default for user-defined types continue to work. (The compiler would continue to automatically generate implementations for hash, serialize, deserialize, the deprecated deserialize init, and also readThis and writeThis. Note that readThis and writeThis are expected to be deprecated but aren't yet).

Transitionally, in 1.31 (and perhaps a few releases after that), the compiler will emit a warning for code defining a specially-named method that doesn't implement the respective interface.

Things we didn't decide, but need to for 2.0

We did not make all the necessary related decisions. Some of these are arguably out of scope here, and others we just ran out of time for. So, decisions still need to be made for:

Although we didn't decide, in our discussions, we are tending towards the following names / syntaxes:

Things we decided not to stabilize by 2.0

Because interfaces are a major implementation effort and language feature, our approach has been to minimize the aspects of interfaces that we want to stabilize for 2.0. Therefore, we will not be stabilizing:

Next steps

  1. Settle the open questions from "Things we didn't decide, but need to for 2.0".
  2. Implement the proposal.
DanilaFe commented 1 year ago

Closing this because the discussion itself has been settled.