anachronauts / jeff65

a compiler targeting the Commodore 64 with gold-syntax
GNU General Public License v3.0
6 stars 0 forks source link

IL Checker #47

Open jdpage opened 6 years ago

jdpage commented 6 years ago

In A Nanopass Framework for Compiler Education (subsequently NFCE), sections 2.1 and 2.2 discuss a system for defining intermediate languages. These provide a way of checking whether a pass behaved as expected, i.e. all of the relevant nodes were transformed.

They also provide a way of establishing the appropriate order of transforms. Currently this is hard-coded based on knowledge of what each transform does, but this doesn't scale. In particular, it isn't obvious where new transforms can/should be inserted. Finally, they provide a way of establishing a stable interface between passes.

NFCE defines five types of compiler pass:

  1. Simplification "reduces the complexity of subsequent passes by translating its input into a simpler intermediate language," i.e. desugaring,
  2. Verification "checks compiler invariants that are not easily expressed within the grammar" for debugging purposes,
  3. Conversion "makes explicit an abstraction that is not directly supported by the low-level target language," i.e. lowering,
  4. Analysis "collects information from the input program and records that information with annotations in the output program," and
  5. Improvement "attempts to decrease the run time or resource utilization of the program," i.e. optimization.

Simplification and conversion passes each transform from one IL to a new IL, analysis and improvement transform from one IL to the same IL, and verification traverses but does not transform.

During debugging, verification passes should be turned on, and conformance against the IL definition should be checked after each pass. During normal operation, some of all of them may be turned off, or selectively enabled using compiler flags.

jdpage commented 6 years ago

As far as I can tell, NFCE just has the passes run in a predefined order, but we can probably make things more complicated than that.

For starters, the fact that simplification and conversion passes change the IR implies a kind of sequential grouping. "Phase" is kind of an overloaded term, so let's call the set of all operations to be done on a particular IR an "era". Each era ends with a simplification pass or conversion pass (or outputting to disk).

Now, if we consider a DAG where each era is a node, and each conversion/simplification pass is an edge, we can, given a target and a starting point, automatically determine an appropriate sequence of conversion/simplification passes. The compiler should (at least in debug mode) verify that only one such path is possible. We then filter out any passes which are unused in this path, including verification, analysis, and improvement passes associated with off-path eras.

Now we construct a new DAG, this time including all passes. We assume that any additional inter-pass dependencies have been annotated appropriately. The resulting DAG can then be topologically sorted, and the resulting sequence used to execute the compilation.

This can be extended, incidentally, to handle automatic recursive compilation, and provides us with a convenient method of handling multiple targets too. Such usage will require that, rather than simply constructing a sequence to be executed, we also consider that passes will be run multiple times on different inputs.