carbon-language / carbon-lang

Carbon Language's main repository: documents, design, implementation, and related tools. (NOTE: Carbon Language is experimental; see README)
http://docs.carbon-lang.dev/
Other
32.42k stars 1.48k forks source link

custom pattern matching proposal #1796

Open h3har opened 2 years ago

h3har commented 2 years ago

The specialization section of the generics doc implies that Optional is (or could be implemented as) a class type and demonstrates how specialization can enable developers to implement optimizations for how Optional is stored for their own types. Optional is (as an example) roughly defined like this:

class Optional(T:! Movable) {
  fn None() -> Self {
    return {.storage = T.(OptionalStorage.MakeNone)()};
  }
  fn Some(x: T) -> Self {
    return {.storage = T.(OptionalStorage.Make)(x)};
  }
  ...
  private var storage: T.(OptionalStorage.Storage);
}

While this will work to allow developers to optimize how their types are stored, it will mean that Optional(T) is not a choice type (as claimed by the error handling doc), and so would require a more sophisticated design for pattern matching to enable it to be used just like choice types, which I'll try to give a possible approach for.

Option: user-defined patterns as class members

In this case, we provide the user a way to define a user defined pattern as a member of a class. The idea is to enable the user to make a class behave exactly like a choice type. Each custom pattern accepts a single argument (whatever thing is being matched) and optionally returns a tuple of zero or more values that the user can destructure if the match succeeds.

I struggled to come up with a good syntax for this, or a way to do it without new syntax. Below is a possible example using some new syntax:

class Optional(T:! Movable) {
  fn None() -> Self {
    return {.storage = T.(OptionalStorage.MakeNone)()}; // as before
  }
  fn Some(x: T) -> Self {
    return {.storage = T.(OptionalStorage.Make)(x)}; // as before
  }

  // defines a custom match for the identifier 'None' when accessed as a field within a match context
  match None(y: Self) {
    if (T.(OptionalStorage.IsNone)(y.storage)) {
      match return; // match succeeded, but we return nothing
    }
    // if we return normally, the match failed
  }

  // defines a custom match for the identifier 'Some' when accessed as a field within a match context
  match Some(y: Self) -> (T,) { // match returns a single-item tuple of T
    if (!T.(OptionalStorage.IsNone)(y.storage)) {
      match return (T.(OptionalStorage.Unwrap)(),); // match succeeded, return a value
    }
    // if we return normally, the match failed
  }

  ...
  private var storage: T.(OptionalStorage.Storage);
}

The above possible syntax repurposes the match keyword to both define a custom pattern for a class member and as part of a special match return statement that indicates the match succeeded. I initially considered return match, but that could potentially become ambiguous if match statements become valid expressions (like in Rust). Another option could be do match?

This still differs from a choice type in that the programmer must supply parens after None when creating a None variant - however, this could be avoided by making it a class constant instead or by somehow making fn None and fn Some special.

Problems and further issues/questions

geoffromer commented 2 years ago

...it will mean that Optional(T) is not a choice type (as claimed by the error handling doc), and so would require a more sophisticated design for pattern matching to enable it to be used just like choice types, which I'll try to give a possible approach for.

It's not yet covered in docs/design (see #1805), but p0157 is the current plan-of-record for solving this problem.

h3har commented 2 years ago

@geoffromer Thanks. That proposal looks much better and more comprehensive than what I thought of :-). I like the use of a Matchable interface instead of magic new syntax. I suppose the MatchContinuation interface is somewhat magical in the sense that it its methods appear to have no implementation anywhere(?) and that the interface syntax is essentially used as an alternative to a choice type to describe the possible choices as a way to "bootstrap" the declaration of a sum type.

One thought is that it would be nice for library developers to be able to specify their generic sum types with the choice sugar without having to worry about locking out library users from optimizing the storage of their sum types for their own types, e.g., by somehow specializing the sum type (declared with choice sugar) so that it is implemented as a class type with custom storage... But I'm not sure the current design for generics would allow this.

Otherwise, library developers who cannot anticipate all of the use cases might feel the need to pessimistically design all or many of their sum types in the same way as Optional is suggested in the docs, to leave the window open for specialized storage.

geoffromer commented 2 years ago

I suppose the MatchContinuation interface is somewhat magical in the sense that it its methods appear to have no implementation anywhere(?)

The intent is that the compiler generates an implementation of MatchInterface, whose method implementations basically call back into the pattern-matching logic at the callsite. In principle you could hand-write an implementation of that interface, but I can't think of any reason you'd want to.

One thought is that it would be nice for library developers to be able to specify their generic sum types with the choice sugar without having to worry about locking out library users from optimizing the storage of their sum types for their own types, e.g., by somehow specializing the sum type (declared with choice sugar) so that it is implemented as a class type with custom storage... But I'm not sure the current design for generics would allow this.

Otherwise, library developers who cannot anticipate all of the use cases might feel the need to pessimistically design all or many of their sum types in the same way as Optional is suggested in the docs, to leave the window open for specialized storage.

That's an interesting point; I agree that seems desirable. I think we might be able to pull it off with some fairly minor tweaks to the current design. Basically, we'd need to say that a choice type behave as if they were syntactic sugar for a class type, and specify the structure of that class type in enough detail that user code can specialize it.

h3har commented 2 years ago

The intent is that the compiler generates an implementation of MatchInterface, whose method implementations basically call back into the pattern-matching logic at the callsite.

Ah okay.

In principle you could hand-write an implementation of that interface, but I can't think of any reason you'd want to.

How could you? What would the ReturnType be? It would have to be able to represent multiple different types in effectively the same way as choice types are intended - a sort of bootstrapping problem. I suppose you could in theory hand-write it with a union (which I assume are intended to be supported) and a discriminant, but the generated code shouldn't have to look anything like that.

I'm also trying to think about how the proposed interface Match method would cooperate with code generation. Match would have to be inlined presumably to get reasonable results (and if ReturnType is a magical compiler type with no runtime memory representation, then maybe it has to be inlined before you even get to code generation). What if someone tries to invoke Match themselves? What do they get? It would be interesting if the compiler's chosen representation for ReturnType is a union plus a discriminant, since that may be exactly what a custom Matchable implementation might be trying to avoid.

geoffromer commented 2 years ago

Just to clear up one possible point of confusion: the var Type:$$ ReturnType; declaration inside MatchContinuation (sorry I called it MatchInterface earlier) would now be written let ReturnType:! Type;. It declares an associated type of the interface, which means that in order to define an implementation of that interface, you have to define your implementation's ReturnType, in the same way that you have to define your implementation's method bodies.

How could you? What would the ReturnType be?

ReturnType can be whatever you want it to be, because you're writing all the code that interacts with it. Keep in mind, the methods of MatchContinuation are callbacks: they define what code you want to execute if the corresponding alternative matches. For example, if I was trying to generate a string representation of a Result, I could make ReturnType be String and then have each alternative method return some string representation of that alternative. Or I could make ReturnType be (), and have the methods produce a string value by mutating a String object I created beforehand.

What if someone tries to invoke Match themselves? What do they get?

If they want to invoke Match themselves, they have to pass an argument of some type that implements MatchContinuation, and that implementation's associated type ReturnType will be the type returned by the Match call.

Does that make sense?

h3har commented 2 years ago

Thanks for your explanation. I clearly misunderstood what the purpose of MatchContinuation was - I thought of it as somehow calling back into the compiler itself, but this makes much more sense. If I understand correctly, what you are saying is that (assume that match as expressions are supported):

let return_value: R = match (result) {
  case .Success(value: T) => { code_for_success_evaluating_to_R }
  case .Failure(error: E) => { code_for_failure_evaluating_to_R }
  case .Cancelled => { code_for_cancellation_evaluating_to_R }
};

will generate an implementation of MatchContinuation by the compiler that looks like this:

class InternalCompilerType { }
impl InternalCompilerType as MatchContinuation where .ReturnType = R {
  fn Success(value: T) -> R { code_for_success_evaluating_to_R }
  fn Failure(error: E) -> R { code_for_failure_evaluating_to_R }
  fn Cancelled() -> R { code_for_cancellation_evaluating_to_R }
}

and the original match statement will be transformed to:

let return_value: R = result.(Matchable(Result.MatchContinuation).Match)(InternalCompilerType{});

Hopefully the compiler can inline the virtual method call (perhaps even inlining the continuations themselves). What happens if Result implemented Matchable for multiple different continuation interfaces? Is that allowed, or is this "interface" parameter like an associated type that can only be set once for each type that implements Matchable?

h3har commented 2 years ago

Hopefully the compiler can inline the virtual method call (perhaps even inlining the continuations themselves).

Never mind, the proposal seems to suggest that Match is itself a generic function. Does this mean there is in theory a version of Match for every place where the object is pattern matched? (i.e., for each match continuation). If that's the case, is the pointer argument to the match continuation necessary? (Couldn't you just access the continuation functions via the continuation type parameter? - it's all known at compile time anyway.)

geoffromer commented 2 years ago

Never mind, the proposal seems to suggest that Match is itself a generic function. Does this mean there is in theory a version of Match for every place where the object is pattern matched? (i.e., for each match continuation).

It depends on what you count as a "version". The owner of a given sum type should only need to write the code for Match once, so in that sense there's only one version. If the implementation uses the static specialization strategy, then multiple copies of Match may be generated during compilation, one for each argument type that Match is called with. And the compiler might or might not reuse the types that it calls Match with, but the difference shouldn't be visible to the user in any event.

If that's the case, is the pointer argument to the match continuation necessary? (Couldn't you just access the continuation functions via the continuation type parameter? - it's all known at compile time anyway.)

I think it's necessary, because the MatchContinuation object might have run-time state. For example, if code_for_success_evaluating_to_R assigns a value to some variable that's outside the scope of the match statement, the MatchContinuation will need something like a pointer to that variable.

github-actions[bot] commented 2 years ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please comment or remove the inactive label. The long term label can also be added for issues which are expected to take time. This issue is labeled inactive because the last activity was over 90 days ago.