[sample implementation] simple Abstract Types extension

awendland commented 3 years ago

I don't expect this to be particularly useful to anyone, especially with the exciting work going on with the GC proposal and Type Imports proposal, but I wanted to share this abstract types language extension I implemented in case it is relevant to someone.

_You can launch a Jupyter notebook with several examples and a modified Wasm interpreter implementing the following features by clicking ._

Overview

I've implemented a small extension to the WebAssembly language that introduces abstract types (also known as "existential types" or "abstract data types" or "opaque references" or "nominal types"). I built this in the context of my undergraduate thesis which explored using WebAssembly as a multi-language platform (by enabling the various features needed for secure compilation).

This implementation of abstract types is based on OCaml's abstract types and loosely conforms with the ideas expressed by @rossberg and @RossTate in https://github.com/WebAssembly/proposal-type-imports/issues/7:

Abstract types are owned by modules.
The module that creates the abstract types can manipulate them directly as their underlying types.
Abstract types become "sealed" once exported from the module, meaning that importing modules can't manipulate their values directly.
There is no sub-typing.

These features allow WebAssembly to enforce higher-level abstractions such as:

Unforgeable file handles (e.g. in WASI)
Object types (i.e. allowing functions to only operate on a given Object)
Object references (i.e. unforgeable addresses, e.g. referring to a in let a = new Date(); a.getYear())

Usage

To work with these new abstract types, I've introduced 4 operators to the language. The syntax is verbose in order to ensure clarity in the operations being performed. It's likely that a more ergonomic syntax would be adopted, such as merging the abstype_new and abstype_sealed namespaces and referring to them with a single operator, as well as overloading the existing type instruction to support abstract types. For now, the syntax is as follows:


`abstype_new`	`abstype_new [IDENTIFIER] value_type`	Create a new abstract type around a given value_type (which can be another abstract type via `abstype_sealed_ref`)
`abstype_sealed`	`abstype_sealed [IDENTIFIER]`	Import a foreign abstract type. Always used within an `import` instruction, i.e. `(import "mod" "id" (abstype_sealed [IDENTIFIER]))`
`abstype_new_ref`	`abstype_new_ref IDENTIFIER`	Reference a local abstract type (i.e. one locally declared using `abstype_new`)
`abstype_sealed_ref`	`abstype_sealed_ref IDENTIFIER`	Reference an imported foreign abstract type (i.e. one imported via `abstype_sealed`)

Abstract types manifest in two ways:

Local - Local abstract types are at play when abstype_new* instructions are used within a given module. These abstract types are "unwrapped" within the module, and are treated as their underlying value_types. In this way, local abstract types are more like type aliases. This allows abstract types to be constructed, and only take on their abstract nature when used in a separate module.
Foreign / Sealed - Foreign, or sealed, abstract types are present when abstype_sealed* instructions are being used. These abstract types are treated as opaque identifiers referencing the source module instance and the export statement. These abstract types are only treated as their underlying values upon program execution (i.e. after validation). Additionally, they do not have default values, so trying to immediately use a local with a sealed abstract type will fail, instead, the local must be populated with a value provided by the sealed abstract type's source module.

Examples

You can see simple syntax/usage examples in the test file for this feature at test/core/abstract-types.wast.

For something more interesting, I've configured my awendland/2020-thesis repository to be runnable via Binder so that you can jump right into a web-based Jupyter notebook with this webassembly-spec-abstypes interpreter already available and the code in samples/ all runnable. Try it out with:

Implementation Details

I've created a PR (https://github.com/awendland/webassembly-spec-abstypes/pull/4) showcasing the implementation on top of the reference interpreter's Reference Types Proposal branch.

The core changes for stack type checking were in interpreter/syntax/types.ml:

adding SealedAbsType of int32 to value_type
introducing a new layer on top of value_type called wrapped_value_types that could be a NewAbsType of value_type * int32 in addition to a straight value_type.

In both of these instances, the i32 value represents a unique identifier for the abstract type, since the types are "nominal", as in, even if they have the same underlying type they're unique if they are from different definitions (i.e. different abstype_new operations). See the test suite where this distinction is demonstrated.

The core changes for type checking module linking were implemented in a new module called interpreter/runtime/extern_types.ml that:

replaces extern_type_of in runtime/instance.ml
handles comparing abstract types, which may either be "resolved" or "unresolved" since, fundamentally, abstract types are based on Module (or Host) instances, not module definitions.
- There are additional constructors (NameModuleRef, LocalModuleRef) which exist to assist with representing abstract types in contexts were modules haven't been instantiated yet, but these representations of abstract types have to be resolved before they can be properly compared.

All uses of value_type (should) have been expanded to support abstract types.

Work in Progress

Several aspects of this implementation are unfinished (and likely to remain so given the exciting development going on with the GC Proposal and my interests moving elsewhere). I documented the issues I was aware of in the test suite; they are:

Enforcement of type sealing when a sealed/foreign abstract type is used as the value for another new abstract type. 3rd party modules are able to use these double-sealed abstract types as if they were the original sealed abstract type and visa-versa.
Enforcement of abstract types for globals. You can import a global as its underlying type and the module linking process will succeed just fine (it should have thrown a linking error saying "incompatible import type").

Additionally, I only implemented abstract types for the textual representation of WebAssembly, not the binary variant or the JS API. The binary variant should be a straightforward addition. The JS API will require various decisions to be made around how to enforce the abstraction across the JS boundary; hard problems that are getting compelling answers in the GC Proposal.

Related Discussions

This small language extension is related to much more exciting discussions going on elsewhere, such as:

https://github.com/WebAssembly/design/issues/1343 - "call_indirect versus abstraction"
https://github.com/WebAssembly/proposal-type-imports/issues/7 - "Instantiating abstract types"
https://github.com/WebAssembly/design/issues/1375 - "Thoughts on improving Compilation Model"
https://github.com/WebAssembly/interface-types/issues/128 - "Guaranteeing additional soundness properties"
https://github.com/WebAssembly/gc/issues/148 - "Nominal types vs. inter-module interaction"
https://github.com/WebAssembly/gc/issues/156 - "Supporting Basic Generics before Advanced Generics"

It's also loosely related to much grander proposals that are in the works: GC, Type Imports, Interface Types.

Thanks for such great work on WebAssembly! I'm excited to see where the language goes.

RossTate commented 3 years ago

Thanks for the great write-up, @awendland! Since you mentioned me, I wanted to clarify that I am actually of the opinion that, if WebAssembly were to follow standard practice, such an extension would be unnecessary. Most module systems ensure data abstraction by default. For example, Java/C# modules do not let outsiders access private fields (though modules might provide indirect access through the reflection data it provides, which is controlled by various security settings). That said, OCaml does come to mind as an outlier due to its unsafe Obj.magic function.

However, such standard practice is incompatible with rtt.canon, and as such @rossberg has been opposed to it. So instead we are stuck with non-standard dynamic abstraction techniques, which are strictly less efficient and less expressive. (With static data-abstraction techniques, you can abstract any type within the signature of any module without changing behavior. With only dynamic data-abstraction techniques, this is impossible for many signatures.)

By the way, this proposal seems to be very similar to the private types that @rossberg added in WebAssembly/proposal-type-imports#8. Unfortunately, both of these seem to be too inexpressive to accommodate common data-abstraction use cases, as discussed in WebAssembly/gc#178.

rossberg commented 3 years ago

Nice! As @RossTate points out, this looks similar to the private types in the type import proposal, except that that treats private types as reference types, in order to allow hiding any type, not just i32. However, looking at the patch, I'm not sure I fully follow the type factorisation you use, e.g., how different sealed types are distinguished, especially at module boundaries -- there doesn't seem to be any link between SealedAbsType value type and an actual type definition?

@RossTate:

if WebAssembly were to follow standard practice, such an extension would be unnecessary. Most module systems ensure data abstraction by default.

That's nonsense. All reasonable module systems support both public (alias) and private (abstract) type definitions, so you need to distinguish both forms somehow. Doesn't matter what the default is, though public is more natural given the preexisting Wasm semantics. That's also what other advanced module systems default to if you do not filter exports explicitly, e.g. ML or Haskell.

However, such standard practice is incompatible with rtt.canon, and as such @rossberg has been opposed to it. So instead we are stuck with non-standard dynamic abstraction techniques, which are strictly less efficient and less expressive.

Sigh. The abstraction mechanism proposed is pretty much how class-based abstraction works in OO languages, how newtype works in Haskell, how abstype works in SML, and so on and so forth. It is the most wide-spread form of value-based type abstraction. None of this has much to do with RTTs either.

RossTate commented 3 years ago

The abstraction mechanism proposed is pretty much how class-based abstraction works in OO languages, how newtype works in Haskell, how abstype works in SML, and so on and so forth. It is the most wide-spread form of value-based type abstraction. None of this has much to do with RTTs either.

Haskell's newtype erases at run time. Quoting from the Haskell 98 report (4.3.2): "These coercions [to and from the newtype] may be implemented without execution time overhead; newtype does not change the underlying representation of an object."

So newtype could be implemented through the proposed abstraction mechanism, but that is not how Haskell is expected to be implemented. Instead, the coercions are erased at run time, and Haskell libraries are designed around this performance consideration. The difference in implementations is unobservable in Haskell because, Haskell does not have casts on arbitrary values. But because WebAssembly does have RTTs and ref.cast, you cannot implement the proposed abstraction mechanism in the same way—it has to actually allocate a new reference with a different RTT from the value being wrapped (if the value has an RTT). So, contrary to what suggested, RTTs are critical here and make the performance of the proposed abstraction mechanism much worse than that of Haskell's newtype.

The same applies to SML's abstype. But, quoting from here: "There is no reason to use abstype in today's SML. Consider it deprecated. It is a relict of pre-module times. You can achieve the same effect of hiding the constructors of a type with structures, signatures and sealing (the :> operator), but in a more flexible and consistent manner." The reason why modules are more flexible is that, whereas abstype could be implemented through low-level coercions, module sealing is erased and takes advantage of that fact to be more expressive. Here is an example from SML:

signature FOO =
sig
  type foo
  val reinterpret_array : int array -> foo array
end

structure INT_FOO :> FOO =
struct
  type foo = int
  val reinterpret_array = fn a => a
end

This example shows that it's possible to reinterpret an array with a public type as an array with a private type. In particular, the two values share the same state. This is impossible with any dynamic abstraction mechanism. Yet the ability to seal the type of values without casts is useful because a module can then can be implemented using some library designed for integer arrays but only expose its values as arrays of an abstract type (e.g. to prevent forging). Because SML ensures data abstraction by default, even though these values are not dynamically sealed there is no way for another module to cast them to or from integers.

That's nonsense.

This response seems to be due to a misunderstanding of what I was describing. Hopefully the above clarifies what I meant. Hopefully now it's clear that standard practice is that values exported via an opaque/sealed type are not dynamically wrapped—instead, the type/module system ensures that values exported via an opaque/sealed type can only be used according to what the various module exports enable. However, that standard practice is incompatible with rtt.canon.

So we are sacrificing performance and standard guarantees in order to support rtt.canon.

WebAssembly / design