Whiley / WhileyCompiler

The Whiley Compiler (WyC)
http://whiley.org
Apache License 2.0
218 stars 36 forks source link

Whiley Low-Level Language #803

Closed DavePearce closed 6 years ago

DavePearce commented 7 years ago

These thoughts come as I'm knee deep in redeveloping both the Java and JavaScript backends.

Problem Statement

The basic issue is that there are a lot of repeating problems across the different backends for Whiley. In principle, I don't mind some repetition. But, some of these problems are complex and really deserve to be solved once properly. A rough breakdown of the various components required by a backend is:

  1. Statements and Expressions. In general, the translation of statements and expressions is straightforward and usually platform dependent. There's not much we can do here.

  2. Data Representation. The representation of data is also relatively platform dependent. For example, the range of fixed-width integers supported. Likewise, whether we have true structs (JavaScript, C) or are stuck with classes and objects (Java). Some of this has implications for the treatment of value semantics (see below).

  3. Runtime Coecions. The point at which coercions must actually be inserted is, in principle, a platform dependent issue. However, identifying when there is a change of representation is platform independent. For example, changing between different fixed-with integer types. Or, changing from a readable array type to an actual physical type. Also, deciding how to implement a coercion is a complex issue.

  4. Runtime Type Tests. These are mostly platform independent and are a big source of complexity in backends. The platform determines whether any runtime type information is available and, if so, what the primitive operations are (i.e. tests for atomic types, such as int). In some platforms (e.g. JavaScript), there is always atomic type information available. In others (e.g. Java), there is sometimes atomic type information available (i.e. not for primitives). Finally, yet others (i.e. C) have no information available. A secondary issue is the potential for using type tags to handle finite types. This requires complex reasoning about types. For example, what is the fininte set of tags for (int|null)&!nat? See RFC for more.

  5. Runtime Assertion Checks. The issues here are platform independent because they can generally be expressed using simple imperative constructs. For example, introducing shadow variables is done in a pretty similar fashion across backends (copy into dummy variables). Likewise, the insertion of type invariant tests is pretty generic (i.e. insert assert statements and/or additional helper methods).

  6. Value Semantics. The main issue here is where and when we need to clone values. Some of this decision process is platform dependent. For example, which things actually need to be copied (i.e. which are references, which are immutable, etc). However, much of this is not and is already encoded in MoveAnalysis.

  7. Function/Method Overloading. The main issue here is that no backend will support overloading out-of-the-box. We need some mechanism for type mangling and this is likely to be the same across platforms.

  8. Multiple Returns. The handling of multiple returns is a relatively minor, though commonly occurring, problem. The general is either to use an array (Java, JavaScript) or a struct (C).

  9. Comment Preservation. The general approach to the preservation of comments and other non-semantic information from the original Whiley source file would appear to be largely platform independent.

Options

The obvious option is to provide some form of "intermediate language" on which the different backends can build. Given that I already had such a thing, it would be good to avoid the same mistakes. Some thoughts:

  1. WyIL. We still need to keep this as the main intermediate language. This is necessary because it encodes important package-level information, such as signatures for functions/methods we want to link against. This will remain the distribution mechanism for compiled Whiley packages.
  2. Structured Control Flow. One of the mistakes made originally with WyIL was the loss of structured control-flow. This was simply difficult to implement on some platforms (e.g. JavaScript) and also resulted in rather ugly generated code.
  3. Ephemeral Nature. Realistically this "low-level intermediate language" does not need to persist on disk, though the ability to pretty print it could be important. Potentially, providing some incremental support could also be helpful eventually.
  4. Extensibility. Obviously, it will need some degree of extensibility to provide for the various differences between target platforms.
  5. Interpreted. I suppose having an interpreter would actually be quite helpful for debugging purposes.

It seems realistically, some kind of imperative language with generic structs, taggable unions and explicit coercions makes a lot of sense. It should ideally provide mechanisms for preserving comments. In many ways, it's not that far from C or the JavaFile AST we already have and similar to thoughts I was having there. Potentially, we can generate Java/JavaScript/C directly from it rather than having yet another intermediate form (i.e. like JavaFile).

DavePearce commented 7 years ago

An interesting question is what the complete set of permitted coercions looks like.

DavePearce commented 7 years ago

Current list of outstanding issues:

  1. Type Representations (see #815 ). How should arbitrarily complex types be represented? What about recursive types? intersections?
  2. Expression Types (see #811). Should every expression have a type? What should this type mean? For example, should it be the result type or the operation type? Probably the latter. How to handle nominals? How to handle intersections?
  3. Runtime Type Tests (see #812). How should these be implemented? Should we assume a default underlying type test operator? Perhaps for primitives + mechanism for distinguishing array from records, etc.
  4. Runtime Coercions (see #814). What kinds of coercion are permitted?
  5. Multiple Returns (see #813). How to deal with these? Should we assume targets don't have option for multiple returns? Yes, and use structs in the obvious way.
DavePearce commented 7 years ago

Current problems I'm pondering:

Ideas: