WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.4k stars 693 forks source link

[sample implementation] simple Abstract Types extension #1394

Open awendland opened 3 years ago

awendland commented 3 years ago

I don't expect this to be particularly useful to anyone, especially with the exciting work going on with the GC proposal and Type Imports proposal, but I wanted to share this abstract types language extension I implemented in case it is relevant to someone.

_You can launch a Jupyter notebook with several examples and a modified Wasm interpreter implementing the following features by clicking launch Binder._

Overview

I've implemented a small extension to the WebAssembly language that introduces abstract types (also known as "existential types" or "abstract data types" or "opaque references" or "nominal types"). I built this in the context of my undergraduate thesis which explored using WebAssembly as a multi-language platform (by enabling the various features needed for secure compilation).

This implementation of abstract types is based on OCaml's abstract types and loosely conforms with the ideas expressed by @rossberg and @RossTate in https://github.com/WebAssembly/proposal-type-imports/issues/7:

These features allow WebAssembly to enforce higher-level abstractions such as:

Usage

To work with these new abstract types, I've introduced 4 operators to the language. The syntax is verbose in order to ensure clarity in the operations being performed. It's likely that a more ergonomic syntax would be adopted, such as merging the abstype_new and abstype_sealed namespaces and referring to them with a single operator, as well as overloading the existing type instruction to support abstract types. For now, the syntax is as follows:

abstype_new abstype_new [IDENTIFIER] value_type Create a new abstract type around a given value_type (which can be another abstract type via abstype_sealed_ref)
abstype_sealed abstype_sealed [IDENTIFIER] Import a foreign abstract type. Always used within an import instruction, i.e. (import "mod" "id" (abstype_sealed [IDENTIFIER]))
abstype_new_ref abstype_new_ref IDENTIFIER Reference a local abstract type (i.e. one locally declared using abstype_new)
abstype_sealed_ref abstype_sealed_ref IDENTIFIER Reference an imported foreign abstract type (i.e. one imported via abstype_sealed)

Abstract types manifest in two ways:

Examples

You can see simple syntax/usage examples in the test file for this feature at test/core/abstract-types.wast.

For something more interesting, I've configured my awendland/2020-thesis repository to be runnable via Binder so that you can jump right into a web-based Jupyter notebook with this webassembly-spec-abstypes interpreter already available and the code in samples/ all runnable. Try it out with: launch Binder

Implementation Details

I've created a PR (https://github.com/awendland/webassembly-spec-abstypes/pull/4) showcasing the implementation on top of the reference interpreter's Reference Types Proposal branch.

The core changes for stack type checking were in interpreter/syntax/types.ml:

In both of these instances, the i32 value represents a unique identifier for the abstract type, since the types are "nominal", as in, even if they have the same underlying type they're unique if they are from different definitions (i.e. different abstype_new operations). See the test suite where this distinction is demonstrated.

The core changes for type checking module linking were implemented in a new module called interpreter/runtime/extern_types.ml that:

All uses of value_type (should) have been expanded to support abstract types.

Work in Progress

Several aspects of this implementation are unfinished (and likely to remain so given the exciting development going on with the GC Proposal and my interests moving elsewhere). I documented the issues I was aware of in the test suite; they are:

Additionally, I only implemented abstract types for the textual representation of WebAssembly, not the binary variant or the JS API. The binary variant should be a straightforward addition. The JS API will require various decisions to be made around how to enforce the abstraction across the JS boundary; hard problems that are getting compelling answers in the GC Proposal.

Related Discussions

This small language extension is related to much more exciting discussions going on elsewhere, such as:

It's also loosely related to much grander proposals that are in the works: GC, Type Imports, Interface Types.

Thanks for such great work on WebAssembly! I'm excited to see where the language goes.

RossTate commented 3 years ago

Thanks for the great write-up, @awendland! Since you mentioned me, I wanted to clarify that I am actually of the opinion that, if WebAssembly were to follow standard practice, such an extension would be unnecessary. Most module systems ensure data abstraction by default. For example, Java/C# modules do not let outsiders access private fields (though modules might provide indirect access through the reflection data it provides, which is controlled by various security settings). That said, OCaml does come to mind as an outlier due to its unsafe Obj.magic function.

However, such standard practice is incompatible with rtt.canon, and as such @rossberg has been opposed to it. So instead we are stuck with non-standard dynamic abstraction techniques, which are strictly less efficient and less expressive. (With static data-abstraction techniques, you can abstract any type within the signature of any module without changing behavior. With only dynamic data-abstraction techniques, this is impossible for many signatures.)

By the way, this proposal seems to be very similar to the private types that @rossberg added in WebAssembly/proposal-type-imports#8. Unfortunately, both of these seem to be too inexpressive to accommodate common data-abstraction use cases, as discussed in WebAssembly/gc#178.

rossberg commented 3 years ago

Nice! As @RossTate points out, this looks similar to the private types in the type import proposal, except that that treats private types as reference types, in order to allow hiding any type, not just i32. However, looking at the patch, I'm not sure I fully follow the type factorisation you use, e.g., how different sealed types are distinguished, especially at module boundaries -- there doesn't seem to be any link between SealedAbsType value type and an actual type definition?

@RossTate:

if WebAssembly were to follow standard practice, such an extension would be unnecessary. Most module systems ensure data abstraction by default.

That's nonsense. All reasonable module systems support both public (alias) and private (abstract) type definitions, so you need to distinguish both forms somehow. Doesn't matter what the default is, though public is more natural given the preexisting Wasm semantics. That's also what other advanced module systems default to if you do not filter exports explicitly, e.g. ML or Haskell.

However, such standard practice is incompatible with rtt.canon, and as such @rossberg has been opposed to it. So instead we are stuck with non-standard dynamic abstraction techniques, which are strictly less efficient and less expressive.

Sigh. The abstraction mechanism proposed is pretty much how class-based abstraction works in OO languages, how newtype works in Haskell, how abstype works in SML, and so on and so forth. It is the most wide-spread form of value-based type abstraction. None of this has much to do with RTTs either.

RossTate commented 3 years ago

The abstraction mechanism proposed is pretty much how class-based abstraction works in OO languages, how newtype works in Haskell, how abstype works in SML, and so on and so forth. It is the most wide-spread form of value-based type abstraction. None of this has much to do with RTTs either.

Haskell's newtype erases at run time. Quoting from the Haskell 98 report (4.3.2): "These coercions [to and from the newtype] may be implemented without execution time overhead; newtype does not change the underlying representation of an object."

So newtype could be implemented through the proposed abstraction mechanism, but that is not how Haskell is expected to be implemented. Instead, the coercions are erased at run time, and Haskell libraries are designed around this performance consideration. The difference in implementations is unobservable in Haskell because, Haskell does not have casts on arbitrary values. But because WebAssembly does have RTTs and ref.cast, you cannot implement the proposed abstraction mechanism in the same way—it has to actually allocate a new reference with a different RTT from the value being wrapped (if the value has an RTT). So, contrary to what suggested, RTTs are critical here and make the performance of the proposed abstraction mechanism much worse than that of Haskell's newtype.

The same applies to SML's abstype. But, quoting from here: "There is no reason to use abstype in today's SML. Consider it deprecated. It is a relict of pre-module times. You can achieve the same effect of hiding the constructors of a type with structures, signatures and sealing (the :> operator), but in a more flexible and consistent manner." The reason why modules are more flexible is that, whereas abstype could be implemented through low-level coercions, module sealing is erased and takes advantage of that fact to be more expressive. Here is an example from SML:

signature FOO =
sig
  type foo
  val reinterpret_array : int array -> foo array
end

structure INT_FOO :> FOO =
struct
  type foo = int
  val reinterpret_array = fn a => a
end

This example shows that it's possible to reinterpret an array with a public type as an array with a private type. In particular, the two values share the same state. This is impossible with any dynamic abstraction mechanism. Yet the ability to seal the type of values without casts is useful because a module can then can be implemented using some library designed for integer arrays but only expose its values as arrays of an abstract type (e.g. to prevent forging). Because SML ensures data abstraction by default, even though these values are not dynamically sealed there is no way for another module to cast them to or from integers.

That's nonsense.

This response seems to be due to a misunderstanding of what I was describing. Hopefully the above clarifies what I meant. Hopefully now it's clear that standard practice is that values exported via an opaque/sealed type are not dynamically wrapped—instead, the type/module system ensures that values exported via an opaque/sealed type can only be used according to what the various module exports enable. However, that standard practice is incompatible with rtt.canon.

So we are sacrificing performance and standard guarantees in order to support rtt.canon.