WebAssembly / component-model

Repository for design and specification of the Component Model
Other
897 stars 75 forks source link

The output of a function in the component model is itself not a valid component model type #349

Open jdegoes opened 2 months ago

jdegoes commented 2 months ago

The output of a function can be:

  1. Any type in the component model
  2. Nothing at all
  3. Multiple named components

Note that (2) and (3) are not valid types in the component model, in the sense that it is not possible to type the parameter of a function as having types (2) or (3), because they are not valid types.

We have a scripting language called Rib that is designed to natively work on component model types (similar to Claw), and the existence of (2) and (3) causes some complication.

How?

Because it means that when you invoke a function (a natural thing to want to do within a scripting language), you cannot take the result and store it into a variable that has a component model type.

In essence, the possible return types of a function are "bigger" than the possible input types to the function. This results in the lack of a mathematical property called "closure", and we could say that component model types are not closed under the operation of function application.

It may be that it is too late to effect any change here (in which case, feel free to close the ticket). But also, (2) is very similar to (), and (3) is vary similar to record—so similar, I would argue, that perhaps from a semantic perspective, (2) could be regarded as "syntax sugar" for (), and (3) as "syntax sugar" for an anonymous (?) record.

With these interpretations, it would be possible to state that the return of a function is a valid component model type, which can be utilized as the type of any parameter of any function, thus achieving closure.

lukewagner commented 2 months ago

That's an interesting point and indeed this is an aspect of the design where we've oscillated over time between having a unit type vs. having possibly-empty-lists-of-types. IIRC, the most recent switch to the current design was discussed in #41 which has some of the reasons to move away from a unit type. Ultimately, I think there are tradeoffs in either case. In the 0.2.*/0.3 timeframe, we're trying not to make any incompatible changes to the C-M, but this is something we could reconsider (based on way more experience) after that when we consider make breaking changes.

Having a component-integrated scripting language with component-level values as first-class values sounds really cool, btw. In the short term, to represent the return value of a function call as a single value, I think you're right that, for case 3, record is the way to go. I see the problem that there's not a component-level unit type to represent case 2 but, since this value will never actually be used (b/c nothing takes a unit), you could perhaps use the "no value" sentinel that many dynamic languages end up having for uninitialized variables (e.g., undefined in JS) or, in the absence of that, the none value of option (e.g., None in Python).

rossberg commented 2 months ago

In essence, the possible return types of a function are "bigger" than the possible input types to the function. This results in the lack of a mathematical property called "closure", and we could say that component model types are not closed under the operation of function application.

Functions have both n-ary parameters and n-ary results in the current design, so at least they are symmetric: both are not first-class values (tuples). The form of closure you are referring to would require that they both are (like in some functional languages). Right now, the language is still closed under (well-typed) function composition fโˆ˜g, which it wouldn't be anymore if you broke that symmetry and changed only the structure of results. Hence, that would in fact make matters worse from the closure perspective.

jdegoes commented 2 months ago

@lukewagner Thanks, we will approach it as you say for now. We do hope to make Rib a standalone crate, embeddable for the many diverse applications of scripting (extension points in databases, proxies, etc.).

@rossberg That's a great point! However, it still doesn't change the fact that the output (nor, as you point out, the full input) can be given a type in the component model.

So in a scripting language, what do you do with let x = invoke(a, b, c);? What is the type of x? In all programming languages I can think of, you can ascribe a type to the result of invoking any (non-void-returning) function.

Of course, we can solve this with let x, y = invoke(a, b, c), i.e. mandatory destructuring assignment for multiple return values. Yet this is somehow unsatisfying because tuples and records can already be used to model functions returning more than one value.

In our ideal world, we could ascribe a component model type to the return value of a (non-void-returning) function. However, I think we have enough to workaround the issue for now, so feel free to close if this is only interesting to us.

(As a Haskell programmer, of course, I'd love for complete inputs of a function to be typeable, but that's extremely rare in any programming language and I don't think the lack of this will confuse any developers; in fact, the opposite.)

rossberg commented 2 months ago

So in a scripting language, what do you do with let x = invoke(a, b, c);? What is the type of x? In all programming languages I can think of, you can ascribe a type to the result of invoking any (non-void-returning) function.

The classic counter-example would be Scheme and friends, with their multiple values. You can only destructure them or compose with another function taking as many arguments. You get a runtime error if you e.g. try to bind them to a single variable.

There is no problem with typing either, it's just not a value type. For example, in Typed Racket you can define a function

(define (f) (values 1 2))

and it will have a type (-> (Values Natural Natural)). But Values is not a type you can assign to a variable or value, only to expressions. Consequently, the following is rejected by the type checker:

(let ((x (f))) x)  ;; type error: Expression should produce 1 values, but produces 2 values

(As a Haskell programmer, of course, I'd love for complete inputs of a function to be typeable, but that's extremely rare in any programming language and I don't think the lack of this will confuse any developers; in fact, the opposite.)

I'd say it is pretty much the norm in the camp of typed functional languages, see e.g. Haskell, OCaml, SML, Agda, Idris.

jdegoes commented 2 months ago

@rossberg In most formal language semantics I have looked at, typing rules apply to type constructors and terms. No doubt, one could give typing rules for things which are not terms (e.g. multiple parameters or multiple return values). However, that would be a separate context than that of terms, which is the only one we're concerned about for Rib.

I didn't know Typed Racket has a precedent for disallowing variable binding on multi-return function invocation. I think we'll probably follow Typed Racket's lead here, and force destructuring on any assignment or attempt to use the result of a multi-return function as an expression. However, in my view, it still leads to some non-uniformity. Ordinarily, you can write f(g(x)). But in the case of multi-return value functions, because what they return is not a term, but rather, multiple terms; and because they do not have a type (at least, in the context of terms), they cannot be passed as values.

It's not the end of the world, for sure, but if no one had ever brought up these issues before, I at least wanted to kick-start a discussion here. Thanks for your input!

rossberg commented 2 months ago

@jdegoes, just a nit: typing rules apply to all expressions, terms = expressions, so these are terms and they have types. There are merely no variables of these types, as their "values" live in a domain distinct from plain values.

It is true that you cannot write f(g(x)) for multi-value functions in Scheme/Racket. But there is no deep reason why they do not allow that to make it symmetric. Their multi-values behave almost like a monad, you have to bind results explicitly or use a suitable combinator (e.g., (call-with-values g f), which is the analogue to >>=).

jdegoes commented 2 months ago

@rossberg Thank you! And since you're giving a free lesson (๐Ÿ™ ๐Ÿ˜‰ ), should the proper way to express what I am looking for be something along the lines of:

The return type of a function is not a value and does not have a type in the domain of types of expressions.

???

rossberg commented 2 months ago

It does have a type in the domain of expressions, but not values, so I'd rather say something like

The result of a function is not a plain value and consequently, does not have a value type.

jdegoes commented 2 months ago

@rossberg I do not see how multi-values qualify as expressions. They can be eliminated through destructuring but there are no operators on them. One could say that import statements are 'expressions' if one works hard enough, but it seems not to match the common meaning of the term. I think multi-values are not values (in the sense that every expression must evaluate to a value), and while they could be ascribed a type (though it's hard to imagine any typing rules or judgements), that "type" exists in a different domain than the types that are ascribed to expressions.

rossberg commented 2 months ago

There is no law that requires all expressions to have types from the same domain (or operators on them). Many languages have multiple type domains. For example, in C++ we have array expressions with array types yet no first-class array values that you could e.g. return from a function.

Here is a little toy language with multi-values:

๐‘’ ::= ๐‘ฅ | ๐‘› | ๐‘’+๐‘’ | ฮป(๐‘ฅ,...,๐‘ฅ).๐‘’ | ๐‘’ ๐‘’ | (๐‘’,...,๐‘’) | let (๐‘ฅ,...๐‘ฅ) = ๐‘’ in ๐‘’

and a type grammar for it:

๐‘ก ::= ๐‘ฃ๐‘ก | ๐‘š๐‘ก ย  ย ย ย ย ย ย ย ย ย  (expression types) ๐‘ฃ๐‘ก ::= โ„• | ๐‘š๐‘ก โ†’ ๐‘ก ย  ย  (value types) ๐‘š๐‘ก ::= (๐‘ฃ๐‘ก,...,๐‘ฃ๐‘ก) ย  ย ย  (multi-value types)

Here are possible typing rules, which assign a type ๐‘ก to each expression ๐‘’:

๐‘ฅ : ๐‘ฃ๐‘กย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย iff ฮ“(๐‘ฅ) = ๐‘ฃ๐‘ก ๐‘› : โ„• ๐‘’โ‚+๐‘’โ‚‚ : โ„•ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย iff ๐‘’โ‚ : โ„• โˆง ๐‘’โ‚‚ : โ„• ฮป(๐‘ฅ,...,๐‘ฅ).๐‘’ : (๐‘ฃ๐‘กโ‚,...,๐‘ฃ๐‘กโ‚™) โ†’ ๐‘ก ย ย ย ย ย iff ๐‘’ : ๐‘ก with ฮ“(๐‘ฅแตข) = ๐‘ฃ๐‘กแตข ๐‘’โ‚ ๐‘’โ‚‚ : ๐‘ก ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย ย iff ๐‘’โ‚ : ๐‘š๐‘ก โ†’ ๐‘ก โˆง ๐‘’โ‚‚ : ๐‘š๐‘ก (๐‘’โ‚,...,๐‘’โ‚™) : (๐‘ฃ๐‘กโ‚,...,๐‘ฃ๐‘กโ‚™)ย ย ย ย  ย ย ย ย ย ย ย ย ย ย ย iff ๐‘’โ‚ : ๐‘ฃ๐‘กโ‚ โˆง ... โˆง ๐‘’โ‚™ : ๐‘ฃ๐‘กโ‚™ let (๐‘ฅโ‚,...๐‘ฅโ‚™) = ๐‘’โ‚ in ๐‘’โ‚‚ : ๐‘กย ย ย ย ย ย ย ย ย ย ย ย iff ๐‘’โ‚ : (๐‘ฃ๐‘กโ‚,...,๐‘ฃ๐‘กโ‚™)ย โˆง ๐‘’โ‚‚ : ๐‘ก with ฮ“(๐‘ฅแตข) = ๐‘ฃ๐‘กแตข

Expression forms like let or application can produce both a single or multiple values, others only either one of them. This roughly corresponds to Scheme / Typed Racket, only slightly generalised in that it treats function arguments as multi-values as well, hence allowing f(g(x)). Many variations of this are possible.