go-interpreter / proposal

Go-interpreter Project Design Documents
57 stars 0 forks source link

consider defining Golang in the K framework for auto-generation of interpreter and model checker #3

Closed glycerine closed 7 years ago

glycerine commented 7 years ago

A proposal to accelerate the development of a Golang interpreter and model checker: use the K framework from UIUC/UAIC Romania.

best introduction: http://www.kframework.org/images/e/eb/K-Overview.pdf

source code, latest release is 4.0.0 as of 2016 July 27. It has seen over 15K commits and 285 releases; it has been under development since 2003 and appears to be actively used for teaching programming language concepts currently. https://github.com/kframework/k

homepage, but not many updates since 2013 -- but contains references to tutorials and 87 academic publications written about K, the theory of its operations, and its applications. http://www.kframework.org/index.php/Main_Page

K is framework for defining programming languages using rewrite rules and context semantics. Once defined, the K framework generates a parser, an interpreter, and a model checker for formal checking of programs written in your language. It has been used to define semantics for 99% of C, for Java1.4, and for Verilog.

What I like about this approach: If we defined the semantics of Go using the K framework, then we would get a correct-by-specification (and easy to change rules if need be) interpreter. Plus we would get model checking tools for free. And any future formal methods tools that Gregor Rosu at University of Illinois Urbana-Champaign adds to the K framework would immediately benefit all languages defined using K semantics. K is written in Java, and the framework's tools integrate into one of the Java IDEs for help in defining new language semantics. However one does not write Java to define new languages. So writing a language semantics for Go would not involve any Java programming.

Very little implementation would would need to be done this way. The C language spec written in K apparently uses 1200 rules, which is not a small number, but is not excessively large. Many of these rules could probably be re-used as a starting point at least for defining Go's semantics.

Open questions incluce: if/how it would integrate with compiled Go code. And questions around concurrency support and integration with the Go runtime/GC.

glycerine commented 7 years ago

C is notoriously under defined, but the semantics of 99% of it, as described in the K framework, are here: https://github.com/kframework/c-semantics

glycerine commented 7 years ago

This quote gives the author's description of the K framework, doing it much more justice than my summary above could. It is from http://fsl.cs.illinois.edu/index.php/CS322_-_Programming_Language_Design:_Lecture_Notes / December 2007. The concurrency definition features may make it particularly attractive.

Abstract: "K is a definitional framework based on term rewriting, in which programming languages, calculi, as well as type systems or formal analysis tools can be defined making use of special list and/or set structures, called cells, which can be potentially nested. In addition to cells, K definitions contain equations capturing structural equivalences that do not count as computational steps, and rewrite rules capturing computational steps or irreversible transitions. Rewrite rules in K are unconditional, i.e., they need no computational premises (they are rule schemata and may have ordinary side conditions, though), and they are context-insensitive, so in K rewrite rules apply concurrently as soon as they match, without any contextual delay or restrictions.

"The distinctive feature of K compared to other term rewriting approaches in general and to rewriting logic in particular, is that K allows rewrite rules to apply concurrently even in cases when they overlap, provided that they do not change the overlapped portion of the term. This allows for truly concurrent semantics to programming languages and calculi. For example, two threads that read the same location of memory can do that concurrently, even though the corresponding rules overlap on the store location being read. The distinctive feature of K compared to other frameworks for true concurrency, like chemical abstract machines (Chams) or membrane systems (P-systems), is that equations and rewrite rules can match across multiple cells and thus perform changes many places at the same time, in one step.

"K provides special support for list cells that carry computational meaning, called computations. Computations are special \curvearrowright-separated lists T_1 \curvearrowright T_2 \curvearrowright \dots \curvearrowright T_n comprising computational tasks, thought of as having to be processed sequentially. Computation (structural) equations or heating/cooling equations, which technically are ordinary equations but which practically tend to have a special computational intuition (reason for which we use the equality symbol \rightleftharpoons instead of = for them), allow to regard computations (finitely) many different, but completely equivalent ways. For example, in a language with an addition operation whose arguments can be evaluated in non-deterministic order, a computation a_1 + a_2 may also be regarded as a_1 \curvearrowright \square + a_2, with the intuition schedule a_1 for processing and freeze a2 in freezer \square + , but also as a_2 \curvearrowright a_1 + \square. Computations can be handled like any other terms in a rewriting environment, that is, they can be matched, moved from one place to another in the original term, modified, or even deleted. A term may contain an arbitrary number of computations, so K can be naturally used to define concurrent languages or calculi. Structural equations can rearrange computations, so that rewrite rules can match and apply. Equations and rules can apply anywhere, in particular in the middle of computations, not only at their top. However, rules corresponding to inherently sequential operations (such as reads/writes of variables in the same thread) must be designed with care, to ensure that they are applied only at the top of computations.

"K achieves, in one uniform framework, the benefits of both Chams and reduction semantics with evaluation contexts (context reduction), at the same time avoiding what may be called the rigidity to chemistry of the former and the rigidity to syntax of the latter. Any Cham and any context reduction definition can be captured in K with minimal (in our view zero) representational distance. K can support concurrent language definitions with either an interleaving or a true concurrency semantics. K definitions can be efficiently executed on existing rewrite engines, thus providing interpreters for free directly from formal language definitions. Additionally, general-purpose formal analysis techniques and tools developed for rewrite logic, such as state space exploration for safety violations or model-checking, give us corresponding techniques and tools for the defined languages, at no additional development cost."

glycerine commented 7 years ago

a shorter blurb from the same page as above....

Abstract. K is an algebraic framework for defining programming languages. It consists of a technique and of a specialized and highly optimized notation. The K-technique, which can be best explained in terms of rewriting modulo equations or in terms of rewriting logic, is based on a first-order representation of continuations with intensive use of matching modulo associativity, commutativity and identity. The K-notation consists of a series of high-level conventions that make the programming language definitions intuitive, easy to understand, to read and to teach, compact, modular and scalable. One important notational convention is based on context transformers, allowing one to automatically synthesize concrete rewrite rules from more abstract definitions once the concrete structure of the state is provided, by “completing” the contexts in which the rules should apply. The K framework is introduced by defining FUN, a concurrent higher-order programming language with parametric exceptions. A rewrite logic definition of a programming language can be executed on rewrite engines, thus providing an interpreter for the language for free, but also gives an initial model semantics, amenable to formal analysis such as model checking and inductive theorem proving. Rewrite logic definitions in K can lead to automatic, correct-by-construction generation of interpreters, compilers and analysis tools. (emphasis mine)

glycerine commented 7 years ago

I tried out K. It is painfully slow. Far too slow to generate a useful interpreter. Deep in the docs they admit as much, too. Closing this idea out.

elliott5 commented 7 years ago

Thank you @glycerine for researching the idea so thoroughly, doing some experiments, and then being honest enough to share your conclusions. But mostly for expanding our minds in the process.