proposal: design of a bytecode interpreter for Go

sbinet commented 8 years ago

Let's do actual work on https://docs.google.com/document/d/1Hvxf6NMPaCUd-1iqm_968SuHN1Vf8dLZQyHjvPyVE0Q/edit

There are currently a few Go interpreters on the market:

llgoi, a JIT-enabled interpreter built on top of LLVM and llgo
ssainterp, an interpreter built on top of golang.org/x/tools/go/ssa
sbinet/go-eval, a package salvaged from pre-Go1
various REPLs that compile on the fly Go statements and execute them,
etc...

The high-level view of this proposal is to describe a naive interactive interpreter for Go, loosely based on the design and implementation of Dis [1], [2], the virtual machine of the Inferno OS. Knowing Go lineage, then, why not?

Dis is a register-based interpreter. It's not alone in the register-based space: one of the most known register-based VM is Lua. See: here and there for inspirations.

Alternatively, we could base the design on a stack-based VM such as CPython's. See here and there for some inspirations. Stack-based VMs are somewhat easier to implement but literature would seem to indicate register-based VMs are more performance friendly.

Working at the bytecode level, with program counters, instructions and frames, should allow us to reuse a fair amount of the cmd/compile infrastructure of the official Go compiler, and to also provide a non-interactive interpreter (i.e.: interpreting whole programs.)

PS: design proposal is here: #2 PPS: additional reddit comments at https://www.reddit.com/r/golang/comments/4zu6bc/gointerpreter_design_proposal_of_a_vm_for_go

other VMs:

elliott5 commented 8 years ago

Thank you Seb, for starting this conversation again, especially in so structured a way. Re-reading the interp-ideas document from 10 months ago, I still think the ideas hold good.

We certainly need a VM to target and, to be self-hosting on all Go platforms, that target VM needs a pure Go implementation.

The simplest route has us inventing a Go-specific VM that exactly implements what is required for Go, and then writing two components a Go->VMbitcode compiler and a VMbitcode interpreter. Aside from the VMbitcode serialization/deserialization part, this is essentially what ssainterp already does. For a list of the Go-specific instructions see the table at: https://godoc.org/golang.org/x/tools/go/ssa

As we would control the format of our own VM, extending it in whatever ways were required to enable a REPL user interface would be relatively straightforward.

But there are two strong attractions to using an existing VM:

Reusing the existing really fast implementations of that VM, maybe even for targets that Go does not currently support; and
Reusing existing software targeting that VM by creating a 'cgo' type calling interface.

The downside of using an existing VM is the sheer scale of the work involved:

We have to model all of the Go-specific features into the VM's feature-set at "compile" time, which may not be a good fit, including making a 'cgo' type calling interface; and
We have to write a Go interpreter for the target VM bitcode, which must include not just the features that Go requires, but also every other feature, in order to allow interworking with existing software modules.

Looking at some of the existing VMs we could choose:

Dis is really interesting, not only for the circularity of Rob Pike having worked on it and it therefore containing some Go-like features, but also the possibility of hosting an entire OS in the VM. However, aside from any technical issues, the LGPL licence raises a red flag in my mind in terms of future commercial use.
lua has the big advantage of an existing interpreter for it written in Go.
cPython has a huge user-base.
llvm already has the llgoi project as part of llgo.
JVM, seen as the antithesis of Go, has a Go interpreter at https://github.com/zxh0/jvm.go (and TARDISgo generated code for it via Java).
...and there are many other VMs too.

But my personal choice for a VM for this project to target would be wasm because I believe this relatively new project has sufficitent support from the major players to create the de-facto standard VM - both in browsers and in servers. So a pure-Go interpreter for it is likely to get some use, as is a Go compiler and REPL built around it. Once the wasm platform is mature, I think it likely that the core compiler will target it too. In the short-term we could compile to it using cgo calls to Binaryen, which is designed to be a compiler back-end, and execute it using V8, which already has a wasm JIT-engine built-in.

sbinet commented 8 years ago

yes wasm and its nascent ecosystem has been on my radar. I'll integrate your feedback in the actual design document I am still writing :) thanks.

sbinet commented 8 years ago

@elliott5 PTAL

elliott5 commented 8 years ago

I've only just started my investigations @sbinet, but the part we are interested in emu "the Inferno emulator" includes a NOTICE that does not have LGPL restrictions but does proclaim copyright for one Russ Cox.

sbinet commented 8 years ago

@4ad: is this tentative proposal vaguely in line with what you had in mind wrt GOARCH=vm ?

themihai commented 8 years ago

Package reflect has some support for this (StructOf, ArrayOf, ...) but it currently has no support for defining new interface types nor any new named types.

It's worth to note that both named types and interface types[0] are on the roadmap. If I would have a vote I would put it on wasm because it's more rewarding. You don't get only a VM but also instant and free support to other environments(i.e. web, IoT)

Unfortunately, there is only a work in progress C/C++ project at the moment (August 2016), so it is probably a bit early to write code to target it.

The specs for MVP (i.e. the format) should be done by the end of this year. Unfortunately other features such GC support will start only after the MVP so I'm wondering if we can workaround these limitations until native support is provided.

[0] https://github.com/golang/go/issues/4146 https://github.com/golang/go/issues/16522

glycerine commented 8 years ago

I think it may be worth pro-actively addressing security concerns. Dynamic code generation at runtime always broadens the security attack surface, and for internet facing programs, this does need to be managed. We need a good story here.

One thought would be to do something like the unsafe package does. To wit, unsafe can be prohibited in contexts where it needs to be.

4ad commented 8 years ago

@4ad: is this tentative proposal vaguely in line with what you had in mind wrt GOARCH=vm ?

I don't think so, my use case and needs are very different from yours.

I want to build an low-latency, real-time, low-throughput, non-parallel interpreter (not JIT) specifically tailored to Go and written in ANSI/POSIX C (perhaps with some assembly, but not Go), small enough that can run on today's (bigger) microcontrollers, and with special support for debugging Go, and for creating dynamic analysis tools. This interpreter will be able to function as a shared library (strange concept, coming from me), and will be able to run old code compiled by older versions of my compiler as well (binaries being forward-compatible is an explicit goal).

The instruction set will be something that is very easy to compile Go to, and something that is also very easy to implement in a performant (and portable) manner without a JIT (threaded code is still under consideration, especially for microcontrollers).

I will then port Go to this virtual machine, rewriting all the runtime, and not only writing a new compiler target, but most likely writing a new middle-end as well.

I keep using the world portable here, that doesn't mean I necessarily want my code restricted to ANSI C. It's much easier to implement performant interpreters in assembly. I just mean that porting is literally easy. It's very easy to port a threaded code interpreter to a new architectures, even if it's written in assembly, because it is very straightforward and contains very few lines of code. The inner loop is literally just a few machine instructions. That being said, I still haven't excluded the option of a pure C interpreter.

I have three main use cases for this.

Even though I keep porting Go to new systems and architectures, the gc implementation still doesn't run on many systems, and it will never run on microcontrollers. As portable the gc implementation is (and as I retargeted it a few times, I know quite a bit about this; it's getting harder and harder to do a port, the gc implementation is becoming less portable), I need a Go implementation that can be ported in maximum one day of work, and preferably less than a couple of hours of work.

I need a fully deterministic implementation of Go. This is one of the reasons I won't do a parallel implementation of Go (there are other reasons for that too).

I need hard real-time execution behavior with real-time GC. Throughput is of no concern of mine, but I plan to use Go to fly planes and rockets. Perhaps surprisingly to some, it's easier to do real-time with an interpreter.

Debugging Go code sucks. My VM will export debugging interfaces that can be targetted by gdb and other debuggers, plus it will implement a debugger itself, with full knowledge about Go internals. Very advanced (not really) features will easily be supported in an interpreter, like reliable watchpoints, always being able to find the origin of allocations, and using copy-on-write memory to find memory corruption bugs.

Sometimes I need to do dynamic analysis over Go programs. This is easy to do in an interpreter, including things like race detection.

The runtime needs to be completely rearchitected for what I want to do. The compiler is deeply entangled with the runtime. It's probably much easier to rewrite the middle-end.

There's also a deeper research goal in all this endeavour. I want to study if it's possible to have a much smaller, and much portable Go implementation, that is still competitively performant, and has an edge over the gc implementation in some niche areas. So even if I can make the gc compiler generate code for my VM, or even if I am able to reuse part of the GC, I still want to write a new compiler and a new GC just to see if I can make a smaller, more maintainable system that is good enough.

I will keep an eye on what's happening in your project, and I hope you'll keep an eye on mine.

kardianos commented 8 years ago

@4ad So similar to running Instruction List or Structured Text on a PLC? Would you do single thread per process and require pre-allocating slices and structures? Would you allow dynamic reloading of modules for online editing?

@sbinet Implementation options aside, what feature sets do you want to target for an interpreter?

I see five classes of use cases for an interpreter:

A stand alone scripting env like you might expect with bash.
A REPL for go.
A bytecode interpreter similar to Java or C# (executes from main())
An extension execution interpreter for existing Go programs, similar to how text/template is executed.
A real time interpreter for control systems

Each on of these could be executed in either:

A trusted environment where the environment (Go packages, operating system) should be exposed.
An untrusted environment where the environment should be mostly isolated.

In each environment the following could all be features of the interpreter:

Breakpoints
Value inspection
Real-time reloading of sections of code
Line by line interpretation without full compile (REPL)
Compile up front into bytecode that ensures no malformed input

sbinet commented 8 years ago

@themihai thanks for the pointers. I had an old (go-1.5) CL for InterfaceOf but didn't know David was working on it.

sbinet commented 8 years ago

@glycerine good point about security. I am not very well versed in such an area, though. However, I suppose @elliott5 is.

sbinet commented 8 years ago

@4ad thanks for the input. sounds very ambitious :) (and way above my limited skill set)

I will definitely keep an eye on your project (feel free to ping me when it's open sourced) -- if only to see how you did it and take inspirations on your runtime and GOARCH-related modifications!

sbinet commented 8 years ago

@kardianos : coming from the scientific crowd, I am interested in a REPL in Go and the ability to extend/script a (Go) program via an interpreter API. using a bytecode interpreter seemed like the most efficient way to achieve this.

for science work, trusting the environment isn't a showstopper but as @glycerine mentioned, we should have the hooks (?) to provide execution in an untrusted environment, preventing execution of code importing "unsafe".

as for the features, I guess the last 2 are the ones I definitely want to provide. the first 2 should come as a by-product of the interpreter (and its API). Real-time reloading of sections of code would be great, but that definitely sounds like something I wouldn't know how to tackle ATM.

elliott5 commented 8 years ago

The generic instructions that our vm will need to work from are defined at https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/gen/genericOps.go

The simplifications that apply to all backend architectures are at https://github.com/golang/go/blob/master/src/cmd/compile/internal/ssa/gen/generic.rules These are used to generate rewritegeneric.go in the directory above.

Thus a generic Go VM design already exists, which is used internally by the core Go compiler to generate code for other architectures ... it is just that no-one has written an interpreter for it yet.

glycerine commented 8 years ago

@elliott5 : That is exactly what I was thinking(!) All the pieces appear to be there in the gc compiler, they just need to be refactored into a pipeline of incrementally updatable elements.

The only trivial change would need to assign function definitions to actual variables, implicitly. This would prevent the compiler from inlining function calls and thus allow the functions to be updated after being defined.

Also the type checker would need to be restartable in the sense that if a definition of a type changes, then it would need to recheck... a hopefully minimal set of dependents. Probably just need to keep a tree (graph) of type dependencies. It seems likely that guru or go/types already does this.

elliott5 commented 8 years ago

Nice article on making the go1.7 SSA form visible at https://pauladamsmith.com/blog/2016/08/go-1.7-ssa.html

Using the first test code from @sbinet

func add(i, j int) int {
    return i+j
}

The last non-architecture-specific version of the generated SSA code (before "pass lower") is

add <nil>
  b1:
    v1 = InitMem <mem>
    v2 = SP <uintptr>
    v6 = Addr <*int> {~r2} v2
    v8 = Arg <int> {i}
    v9 = Arg <int> {j}
    v10 = Add64 <int> v8 v9
    v11 = VarDef <mem> {~r2} v1
    v12 = Store <mem> [8] v6 v10 v11
    Ret v12
name i[int]: [v8]
name j[int]: [v9]

mewmew commented 8 years ago

Hi everyone,

I'm excited to see this discussion taking place.

In response to @sbinet's comment in #2

We should note though there exists a pure-Go project to interact with the LLVM IR: llir/llvm. This project is still a work in progress at this time of writing (August 2016).

The development of llir/llvm is progressing steadily and should be ready for initial use by some applications within the months to come. It is currently used in a compiler project to translate a subset of C to LLVM IR. Within the near future, the intention is to use it within a decompilation pipeline translating LLVM IR to Go.

The compiler project stress tests the write support for LLVM IR, while the decompiler stress tests the read support for LLVM IR.

I'd be very interested in hearing if there are any specific requirements that the interpreter project may have, which are not yet covered by the aforementioned projects.

So, brief status update from the llir/llvm project as of 2016-09-02:

[x] LLVM IR assembly write support: done
[ ] LLVM IR assembly read support: work in progress (tracked by llir/llvm#15)
[x] LLVM IR in-memory representation: partial. A subset of the LLVM IR instructions are supported. Support for additional instructions are added as the need arises.

Cheers /u

lologarithm commented 8 years ago

I participated on the reddit thread about this but I thought having a more 'official' response could be useful -- I wanted to cast a vote for wasm.

It seems like most UIs for Go are just webpages and being able to execute Go in a browser would very useful.

mewmew commented 8 years ago

I wanted to cast a vote for wasm.

Just to add to the discussion related to wasm.

From https://github.com/WebAssembly/design/blob/master/TextFormat.md#official-text-format:

Here are some of these prototypes. Keep in mind that these aren't official, and the final official format may look entirely different:

...

LLVM backend (the CHECK: parts of these tests) emits compatible s-expressions.

...

Thus it may be feasible to still target LLVM IR, which in turn may be translated to wasm.

One benefit of this approach is that it lifts the requirement of a browser environment for execution, and gives support for natively running applications; with potential performance gains.

lologarithm commented 8 years ago

Browser env isn't required to execute wasm: https://github.com/WebAssembly/design/blob/master/NonWeb.md

tiborvass commented 8 years ago

Browser env isn't required to execute wasm: https://github.com/WebAssembly/design/blob/master/NonWeb.md

What @lologarithm said. Lot of people don't know that, I believe even in the Go team. I think it would be extremely valuable to have the Go team/community's input while wasm is not set in stone. If we wait until the design is frozen, we might not be able to benefit from it.

glycerine commented 8 years ago

Two thoughts -

For the effort it takes to do a byte code interpreter, it would be about the same work to target a JIT-able environment. And the resulting code would be much faster.
I just realized the mono runtime is open source and offers a JIT framework. It is MIT/BSD licensed. https://github.com/mono/mono. Just another option.

sbinet commented 8 years ago

@glycerine wrt mono... how's the community behind it? I suppose that it would make more sense to target the LLVM/CLang JIT framework than the mono one, community wise (at least).

sbinet commented 8 years ago

@tiborvass you should probably raise this issue on either golang-nuts or on gophers.slack #general :)

sbinet commented 8 years ago

wrt wasm or LLVM-IR... it's hard for me to actually find "scientific" arguments for one or the other. wasm is probably poised to take over the web. but LLVM-IR has already been battlefield tested. and it would seem that one can do at least a correct llir->wasm translation.

(also: gopherJS will probably provide a wasm backend at some point in the (not too distant?) future...)

dmitshur commented 8 years ago

GopherJS itself is unlikely to provide a wasm backend, there's almost nothing in common between the current approach and wasm. It'd be an entirely new compiler. The main thing I can see being shared are the lessons learned and the API/design of the js package to access the JavaScript world.

(Source: see https://github.com/gopherjs/gopherjs/issues/432#issuecomment-200323535.)

sbinet commented 8 years ago

thanks for the input. I'd thought that your ( @shurcooL ) work towards implementing a GOARCH=js backend for the Go gc compiler (by first refactoring the gopherjs command into a toolchain-based command) would be the stepping stone for GOARCH=wasm.

dmitshur commented 8 years ago

If you're talking about https://github.com/gopherjs/gopherjs/issues/388, the goal of that work is to use cmd/go as the build tool, adding -compiler=gopherjs support to it (which would compile with GOARCH=js). There are no changes to the gc compiler planned as part of that work.

To support compiling to wasm, you would need an actual compiler that does that. :) That's quite different.

sbinet commented 8 years ago

yes, I was talking about that.

glycerine commented 8 years ago

@sbinet

wrt mono... how's the community behind it? I suppose that it would make more sense to target the LLVM/CLang JIT framework than the mono one, community wise (at least).

It first I thought llvm didn't have windows support, but then I checked and apparently llvm now supports building on windows as well as osx and linux. And with the llgo project, much work has already been done there. So I agree, llvm is probably a stronger choice than mono.

A concrete step would be to implement the plugin interface that Ian defined, one that would let go programs load shared objects -- and then after loading, all share the same go runtime (scheduler and garbage collector). That's a pretty good sized chunk of work by itself, but would open up new vistas in terms of being able to run jit-ed and aot-compiled code on the same footing.

I heard an unconfirmed rumor that someone on the Go team may be working on plugin loading at present. Has anyone heard more?

sbinet commented 8 years ago

@crawshaw is working on it: https://go-review.googlesource.com/c/27823/ https://golang.org/s/execmodes

(most reasonably, only linux support for go-1.8)

crawshaw commented 8 years ago

The golang-codereviews mailing list is a great source of rumors.

The plugin work is a 20% time hobby for me. I'd like to do Darwin too, but there me groundwork needed there and I'm not sure I'll find the time.

mewmew commented 8 years ago

Status update from the llir/llvm project as of 2016-11-26.

An extensive cleanup/rewrite of the ir package has now been completed. The API has been redesigned to make it more pleasant to work with. The basic idea behind the new API is that entire modules are constructed before running semantic analysis, rather than checking for errors during construction of individual instructions and values. This reduces the need for sprinkled if err != nil { ... } and facilitates the workflow, as instructions producing values may now be used directly as operands to create new instructions.

But enough talk, lets look at an example (code on play.golang.org).

package main

import (
    "fmt"

    "github.com/llir/llvm/ir"
    "github.com/llir/llvm/ir/constant"
    "github.com/llir/llvm/ir/types"
)

func main() {
    // This example produces LLVM IR code equivalent to the following C code,
    // which implements a pseudo-random number generator.
    //
    //    int abs(int x);
    //
    //    int seed = 0;
    //
    //    // ref: https://en.wikipedia.org/wiki/Linear_congruential_generator
    //    //    a = 0x15A4E35
    //    //    c = 1
    //    int rand(void) {
    //       seed = seed*0x15A4E35 + 1;
    //       return abs(seed);
    //    }

    // Create convenience types and constants.
    i32 := types.I32
    zero := constant.NewInt(0, i32)
    a := constant.NewInt(0x15A4E35, i32) // multiplier of the PRNG.
    c := constant.NewInt(1, i32)         // increment of the PRNG.

    // Create a new LLVM IR module.
    m := ir.NewModule()

    // Create an external function declaration and append it to the module.
    //
    //    int abs(int x);
    abs := m.NewFunction("abs", i32, ir.NewParam("x", i32))

    // Create a global variable definition and append it to the module.
    //
    //    int seed = 0;
    seed := m.NewGlobalDef("seed", zero)

    // Create a function definition and append it to the module.
    //
    //    int rand(void) { ... }
    rand := m.NewFunction("rand", i32)

    // Create an unnamed entry basic block and append it to the `rand` function.
    entry := rand.NewBlock("")

    // Create instructions and append them to the entry basic block.
    tmp1 := entry.NewLoad(seed)
    tmp2 := entry.NewMul(tmp1, a)
    tmp3 := entry.NewAdd(tmp2, c)
    entry.NewStore(tmp3, seed)
    tmp4 := entry.NewCall(abs, tmp3)
    entry.NewRet(tmp4)

    // NOTE: The sem checker is yet to be implemented.
    //
    // Perform static semantic analysis on the LLVM IR module.
    //if err := sem.Check(m); err != nil {
    //  log.Fatal(err)
    //}

    // Print the LLVM IR assembly of the module.
    fmt.Println(m)

    // Output:
    // @seed = global i32 0
    // declare i32 @abs(i32 %x)
    // define i32 @rand() {
    // ; <label>:0
    //  %1 = load i32, i32* @seed
    //  %2 = mul i32 %1, 22695477
    //  %3 = add i32 %2, 1
    //  store i32 %3, i32* @seed
    //  %4 = call i32 @abs(i32 %3)
    //  ret i32 %4
    // }
}

The API of the ir package may be considered feature complete. The few missing instructions may be added as the need arise.

I'd be very interested in any feedback on the API so we can make improvements before declaring a stable version.

@sbinet Are there any specific requirements of the go-interpreter project which are not yet covered by llir/llvm? For reference, the uC compiler has been updated to use the latest version of the ir package and is capable of generating valid LLVM IR for a defined subset of the C language.

@sangisos How do you feel about the new API? Any concerns, ideas or feedback in general? What may still be improved? What feels weird? Does it feel better than the last one?

It would be great to move forward and stabilize the API of the ir package within the next couple of months. For this, I really need some beta-testers!

Anyone up for the challenge?

Cheers /u

go-interpreter / proposal

proposal: design of a bytecode interpreter for Go #1