[Feature Idea] Contextual method modules

rossberg commented 3 years ago

Contextual method modules

[This is a half-baked idea meant as food for discussion, not a finalised proposal. Let me know what you think, or if you see ways to make it more powerful.]

Motivation

There is a schism in Motoko between object-oriented (using classes and methods) vs procedural/functional (using types and functions) patterns for defining abstractions. Both have pros and cons that cannot be reconciled in the language as is.
We have special rules for treating arrays, blobs, and text as pseudo objects when used with dot notation. Besides being super ad-hoc, the special status of these functions has repeatedly confused users who tried to find them, e.g., in the Array module.
Functional notation, as required for most of our library functions, has a well-known UX disadvantage in that e.g. code completion does not apply.
We have repeatedly had the desire for a type-class like mechanism, e.g., for printing or serialisation. (This proposal does not have comparable expressiveness, but works for basic cases.)

Idea

This is a moderate proposal to essentially generalise the special mechanism for arrays and text and make pseudo methods similar to theirs user-definable.

It does so without any syntactic extension and with only a local change to the type system (though making it syntactically more explicit would be an option).

The basic idea is to generalise the typing rule for the dot operator. Given the expression e.x:

If e does not have object type, or it is an object with no field x, then the field is searched for in the environment.
The search looks for a module that (1) has a (possibly parameteterised) type member named Self that (can be instantiated such that it) is a supertype of e's type, and (2) has a value member x.
In case there are multiple modules in the context satisfying these two conditions, the innermost (most recent) binding takes precedence. (This is to maintain compositionality.)
If a suitable module, say M, is found, then the expression is typed as if it was the application M.x(e) (including possible inference for omitted type arguments).

Note: If the type of e is None, then by subsumption it has any field x, so this fallback does not apply and it is typed the same as always.

Examples

The existing magic array, blob, and text methods could be defined by their respective modules Array, Blob, and Text as follows (assuming we include appropriate sugar for curried function definitions):

module Array {
  public type Self<T> = [T];

  public func get<T>(a : [T])(i : Nat) : T { a[i] };
  public func size<T>(a : [T])() : Nat { ... };
  public func keys<T>(a : [T])() : Iter<Nat> { ... };
  public func vals<T>(a : [T])() : Iter<T> { ... };
  ...
}
// ...and something analogue for mutable arrays

module Blob {
  public type Self = Blob;

  public func size(b : Blob)() : Nat { ... };
  public func vals(b : Blob)() : Iter<Nat8> { ... };
  ...
}

module Text {
  public type Self = Text;

  public func size(t : Text)() : Nat { ... };
  public func chars(t : Text)() : Iter<Char> { ... };
  ...
}

Generic interfaces could be defined as module types:

type Printable<T> = module {
  type Self = T;
  print : T -> () -> ();
}

A user-defined type could e.g. implement this interface simply as part of their own module (taking advantage of structural subtyping):

module Complex {
  public type Complex = (Float, Float);
  public func add((x1, y1) : Complex, (x2, y2) : Complex) : Complex { (x1+x2, y1+y2) };
  // ...
  public type Self = Complex;
  public func print((x, y) : Complex)() { x.print(); y.print() };  // assuming Float defines print
}

This would also work for polymorphic functions, although unlike with type classes, the "dictionary" parameter would remain explicit:

func log<T>(PrintT : Printable<T>, x : T) {
  ... x.print() ...
}

Similarly, the pattern could be used for generic types:

module Stack {
  public type Stack<T> = [T];
  // ...
  public type Self<T> = Stack<T>;
  public func print<T>(s : Stack<T>)(PT : Printable<T>) { ... };
}

(Note: In this case, the module wouldn't actually match the Printable interface. That would require "functorising" the module. Would that be preferable?)

Possible Variations

Constrain syntax to calls: Evoke this fallback only as part of an application e.x(e1,...eN). Pro: prevents possible side effects from hiding behind innocent projection syntax. Con: loses the ability to treat these methods as first-class, like proper ones (i.e., e.x itself wouldn't work).
Avoid currying: Special-case not just typing of e.x, but composition with application such as e.x(e1,...,eN), which gets turned into M.x(e,e1,...,eN). Pro: same as above, plus avoids overhead of currying and allows simpler method definitions. Con: same as above, plus more ad-hoc, less compositional semantics; special rules for non-tuple arguments.
Separate selection syntax: We could distinguish pseudo field projection syntactically from proper projection by using a different token. Pro: conceptually cleaner distinction. Con: more syntax to learn; would not be backwards-compatible with current array/text syntax.
Explicit candidate syntax: We could require a new form of explicit declaration to turn a module into a possible candidate for lookup (instead of just relying on its type in form of the presence of a Self member). Pro: probably increases readability by making the modules to consider in scope more apparent. Con: more extra tedium when module is imported or received as a parameter. (A simpler, though rather hacky middle grounds would be to depend on certain naming conventions for such modules.)

Possible Extensions

Object coercion operator: instead of just implicitly producing a single method, we could introduce an operator that wraps a value into an object by picking up all required fields in the same way. This would be somewhat similar to the interface mechanism in Go, and allows closing over contextual methods in a convenient manner.
Chaining: Enable modules to somehow encode prerequisite constraints, such that chaining works properly for the parameters of parameterized types. Is there a decent way to support this in the suggested framework?

crusso commented 3 years ago

Noted - will take a look later on...

The search looks for a module that (1) has a (possibly parameteterised) type member named Self that (can be instantiated such that it) is a subtype of e's type, and (2) has a value member x.

I think that should be supertype...

Your example:

 type Printable<T> = module {
  type Self = T;
  print : T -> () -> ();
}

would require separate support for open type definitions, which we still don't have.

nomeata commented 3 years ago

Quick read only, but is very promising! The first two variations may be worth considering.

rossberg commented 3 years ago

I think that should be supertype...

Ah yes, of course. Fixed.

crusso commented 3 years ago

Something like this would be good, but:

type Self = T seems a bit ad-hoc and restrictive.
C# extension methods use a special parameter keyword to indicate the this parameter. That would avoid restricting to just one self type per module.
It's also not clear to me what innermost means when bindings are recursive and modules can nest:
- Do we only consider the outermost level of a module?
- And if we consider nested modules in the search, what's the order of precedence - alphabetic based on module field name? Both seem ad-hoc.

crusso commented 3 years ago

Another way to skin this cat might be to use a bound named Self.

func bytes<Self <: Nat>(x : Self) : Iter.Iter<Nat8> { .... }

But maybe that's too awkward as a way of opting in to the syntax.

rossberg commented 3 years ago

type Self = T seems a bit ad-hoc and restrictive.

Is it all that different from the usual ML type t convention?

C# extension methods use a special parameter name to indicate the this parameter. That would avoid restricting to just one self type per module.

Hm, I'd prefer to avoid hacks around assigning special meaning to otherwise alpha-convertible names (like value or type parameters).

Is it important to support more than one self type in a single extension module? I mean, this is a kludge for OO-like abstractions, so having just one type is not entirely surprising.

It's also not clear to me what innermost means when bindings are recursive and modules can nest:

Do we only consider the outermost level of a module?

And if we consider nested modules in the search, what's the order of precedence - alphabetic based on module field name? Both seem ad-hoc.

I wouldn't take nested modules into consideration for lookup, that would be very surprising and difficult to control. That is, an applicable extension module must be directly in scope. If you were to combine modules with type classes, that is the same semantics I would expect around instance definitions, so I don't think it's ad-hoc.

Good question about recursive bindings. A plausible rule would be that within the same scope, the last one always takes precedence, even if it is recursive with an earlier one.

crusso commented 3 years ago

type Self = T seems a bit ad-hoc and restrictive.

Is it all that different from the usual ML type t convention?

Well, it's different in that the ML type t convention doesn't affect typing; this does.

Your examples seem to imply that you can only extend by fully generic methods. C# mechanism is more generally useful as you can write

public static int Sum(this List<int> l) { ... } 

public static Option<B> Find<A,B>(this List<Tuple<A, B> ; A a) { ... }

where the receiver isn't fully generic but a specialized instance. This is actually very useful and lets you define things as extension methods that you cannot define as instance methods (which must be fully generic).

C# extension methods use a special parameter name to indicate the this parameter. That would avoid restricting to just one self type per module.

Hm, I'd prefer to avoid hacks around assigning special meaning to otherwise alpha-convertible names (like value or type parameters).

Well, this is just assigning special meaning to non-alpha convertible field names. Not really sure that's much better is it? But I sympathize that both are ad-hoc and gross.

But in any case, I was actually wrong. C# uses a separate this keyword to mark the special parameter, which is still arbitrarily named:

  public static class StringExtension
    {
        // This is the extension method.
        // The first parameter takes the "this" modifier
        // and specifies the type for which the method is defined.
        public static int WordCount(this String str)
        {
            return str.Split(new char[] {' ', '.','?'}, StringSplitOptions.RemoveEmptyEntries).Length;
        }
    }

Why do we need the marker at all, could we just say that any function f can be invoked with exp.f syntax rather than having declare it as such. I can't immediately think what goes wrong, so might be worth considering as a KISS solution.

or we introduce symmetric syntax like (my preference)

func l.find<A,B>(eq : (A,A) -> Bool)  : ?B where l : List<(A, B)> { ...
}

or, if we can tolerate the scope extrusion familiar to C#:

func (l : List<(A,B)>).find<A,B>(eq : (A,A) -> Bool)  : ?B  { ...
}

or, if not, and perhaps more Java like:

func <A,B> (l : List<(A,B)>).find(eq : (A,A) -> Bool)  : ?B { ...
}

Is it important to support more than one self type in a single extension module? I mean, this is a kludge for OO-like abstractions, so having just one type is not entirely surprising.

(see above, C# extension methods are not that restrictive)

Good question about recursive bindings. A plausible rule would be that within the same scope, the last one always takes precedence, even if it is recursive with an earlier one.

Or we just allow imports to bring extensions into scope, not arbitrary bindings, but that would go against your dictionary passing example.

(edited the examples once again - my brain is fried)

rossberg commented 3 years ago

Your examples seem to imply that you can only extend by fully generic methods. C# mechanism is more generally useful as you can write
public static int Sum(this List<int> l) { ... } 

public static Option<B> Find<A,B>(this List<Tuple<A, B> ; A a) { ... } 
where the receiver isn't fully generic but a specialized instance. This is actually very useful and lets you define things as extension methods that you cannot define as instance methods (which must be fully generic).

Yes, but that limitation was rather intentional. I'm sceptical of creating such an impedance mismatch. For example, a potential object coercion operator could not handle such methods, and there would be no way to describe the overall pseudo-type of a thus extended object. If we wanted the ability to define such methods, then it should be possible both internally and externally, i.e., the language should enable a kind of GADT-style constraint mechanism. But honestly, that doesn't seem worth the complexity for Motoko.

C# extension methods use a special parameter name to indicate the this parameter. That would avoid restricting to just one self type per module.

Hm, I'd prefer to avoid hacks around assigning special meaning to otherwise alpha-convertible names (like value or type parameters).

Well, this is just assigning special meaning to non-alpha convertible field names. Not really sure that's much better is it? But I sympathize that both are ad-hoc and gross.

Hm, I think these are very different classes of names. One occurs in types and affects well-typedness, the other doesn't.

Why do we need the marker at all, could we just say that any function f can be invoked with exp.f syntax rather than having declare it as such. I can't immediately think what goes wrong, so might be worth considering as a KISS solution.

In such a setup, having a module in scope would no longer suffice to bring extension methods into scope. E.g, when you import a lib, you'd also need to "open" it somehow. But that is a mechanism we don't have (for good reasons), and if we did, this interaction would encourage overusing it even more.

or we introduce symmetric syntax like (my preference)
func l.find<A,B>(eq : (A,A) -> Bool)  : ?B where l : List<(A, B)> { ...
}
or, if we can tolerate the scope extrusion familiar to C#:
func (l : List<(A,B)>).find<A,B>(eq : (A,A) -> Bool)  : ?B  { ...
}
or, if not, and perhaps more Java like:
func <A,B> (l : List<(A,B)>).find(eq : (A,A) -> Bool)  : ?B { ...
}

Well, yeah, but either of these requires heavyweight new syntax constructs, heavyweight new sorts of types (what is the type of find?), and both probably with significant repercussions. And it has the scoping problem I mentioned. My idea was that all of that could be avoided in favour of a simple and semantically localised convenience mechanism that nicely reuses the module system.

cristianoc commented 3 years ago

Wondering if the values are enough to suggest answers to these design questions: https://sdk.dfinity.org/docs/language-guide/about-this-guide.html#_engineering_values_and_goals

crusso commented 3 years ago

Ha!

Well, if expressiveness is secondary then that would favour @rossberg's proposal, but I still dislike the idea of encoding this information is some orthogonal language feature (i.e. modules and type components) in a way that doesn't even handle some of the examples that would be useful to handle and also restricts us to defining separate modules for related functions.

Another syntax occurs to me - declaration with a prefix dot

func . find<A,B>(l : List<(A,B)(eq : (A,A) -> Bool)  : ?B  { ...
}

And we just encode dottiness in the name of the field/identifier.

.find : <A,B>List<A,B>->((A,A)->Bool) -> ? B

Resolution could choose the nearest applicable identifier, for some equally arbitrarily defined notion of nearest.

rossberg commented 3 years ago

FWIW, one of my motivations here was that we can provide and document pseudo methods like those on arrays along with all the other array functionality in their dedicated module. E.g., because that's where people look for them, as is evident from the repeated complaints / bug reports we get about array .size and friends missing in the docs. That makes a module-based solution a natural choice, I think. I'd also prefer it to work without fuss via ordinary imports of the respective library modules.

nomeata commented 3 years ago

Right now we have the property that an IDE can link occurrences to definitions without having to do a typecheck. This property would be lost in all of these variants, right? Would this bother @kritzcreek?

crusso commented 3 years ago

FWIW, one of my motivations here was that we can provide and document pseudo methods like those on arrays along with all the other array functionality in their dedicated module. E.g., because that's where people look for them, as is evident from the repeated complaints / bug reports we get about array .size and friends missing in the docs. That makes a module-based solution a natural choice, I think. I'd also prefer it to work without fuss via ordinary imports of the respective library modules.

Sure, but that's just a mode of use what I'm proposing. There's no need to bake it in.

cristianoc commented 3 years ago

Would it make sense to break up the proposal into a core part, and extensions. This assumes that extensions can be implemented later without introducing breaking changes or cognitive overhead. Then go ahead with the core part, learn from what people build with it, then proceed with extensions later on with more confidence.

dfinity / motoko