go-interpreter / proposal

Go-interpreter Project Design Documents
57 stars 0 forks source link

discoveries and a gofront proposal #4

Closed glycerine closed 6 years ago

glycerine commented 6 years ago

A couple of new discoveries, and a proposal for a stand-alone Go REPL front-end.

On the news front:

a) David Crawshaw's https://github.com/neugram/ng project revealed an import design point. This was probably obvious already to everyone but me. The Go specification semantics won't work without modification at the REPL. Go assumes that all definitions are entered "at once". Whereas, as the REPL, definitions are entered sequentially, and repeated upon re-definition (while working with the code). Hence some modification of the exact Go rules will be necessary to make a REPL work. Hopefully these will be minor. The := exactly-one-use rule will obviously have to be omitted to allow re-definitions at the REPL. One obvious goal of defining a subset would be to require minimal changes when going from interpreted code to compile-ready code.

b) While normally I avoid grammar generators, some comments on ANTLR4 have made me reconsider and look into it. They now do generate code for Go, and even have a full grammar for most popular languages, including Go, available in their repo. I'm exploring this further, but could be a great springboard towards completeness and correctness of the front end. Relevant links:

https://github.com/antlr/antlr4/blob/master/doc/go-target.md

https://github.com/antlr/antlr4/pull/1321

https://github.com/antlr/grammars-v4/tree/master/golang

c) Chez scheme is now open source under an Apache 2 license. Chez represents a viable backend; it provides an extremely mature and high performance incremental native code generator. It doesn't give the bloated feeling of llvm, and it is portable to windows, OSX, linux. Kent Dybvig (a CS professor with Scheme focus and a leading Scheme proponent) is now working for Cisco and Cisco released Chez open source, so I assume that they bought his company. If you look at scheme benchmarks, Chez is quite impressive.

https://cisco.github.io/ChezScheme/#intro

https://www.infoq.com/news/2016/05/chez-scheme-open-sourced

https://news.ycombinator.com/item?id=13656943 the comment there: "Wow, Chez mostly beats out Stalin." Stalin is a whole-program aggressively optimizing numerically-oriented scheme compiler, almost the gold standard for approaching-C-performance in the scheme world. So this is saying something.

https://ecraven.github.io/r7rs-benchmarks/

d) a proposal for Gofront

I expect many back end technologies will be useful (Sebastien's current work on wagon, Elliott's haxe work, possibly the future work on Chez, and others), but they will all need a front end--the input part of reading the next line from the human (or a source file)--that has reasonable semantics and provides usable human experience. The front end should be interactive, provide the REPL, with at least some command history.

Towards the end of providing an interactive frontend for the go-interpreter project, I propose a "gofront" sub-project. The idea is to be the front end, that parses a (modified for interactive use) subset of the Go language, and produces ast fragments that can the be translated incrementally by various backends. It would be ideal some day, to utilize gc's own codegen capabilities too, but that would require a somewhat stable API into the gc compiler, and I don't see that on the foreseeable horizon. The name being similar to "C-front" should not imply that it has the same purpose as Stroustrup's cfront. gofront is simply aimed at being the front-end for a GO REPL that is useful for doing exploratory data analysis at scale.

My zygomys project (https://github.com/glycerine/zygomys) used a Pratt parser to achieve some 20% coverage of Go syntax (basically just lexing of the same tokens, comments, if-then-else and simple math expressions involving *,/,+,-; there is much left to handle) at the zygo REPL, and could probably be incrementally extended to obtain better coverage. The ANTLR4 approach would start with almost complete coverage and need to be whittled back to obtain usability. A meeting in the middle or a blend of these might prove fruitful.

Feedback welcome.

Jason

glycerine commented 6 years ago

@sbinet @elliott5 Let me know your thoughts!

glycerine commented 6 years ago

To better delineate scope, I see gofront as stopping after type checking. In essence, gofront would do the same job as the loader (https://godoc.org/golang.org/x/tools/go/loader) or the go/types package: guaranteeing that the input is syntactically correct and type checked to the extent possible at compile time.

https://github.com/golang/example/tree/master/gotypes outlines the tasks to be performed:

The Go type checker does three main things. 
First, for every name in the program, it determines
 which declaration the name refers to; this is known as
 identifier resolution. Second, for every expression in 
the program, it determines what type that expression 
has, or reports an error if the expression has no type,
 or has an inappropriate type for its context; this is 
known as type deduction. Third, for every constant 
expression in the program, it determines the value of
 that constant; this is known as constant evaluation.

Superficially, it appears that these three processes
 could be done sequentially, in the order above, but 
perhaps surprisingly, they must be done together.
 For example, the value of a constant may depend
 on the type of an expression due to operators like 
unsafe.Sizeof. Conversely, the type of an expression 
may depend on the value of a constant, since array 
types contain constants. As a result, type deduction
 and constant evaluation must be done together.

As another example, we cannot resolve the identifier 
k in the composite literal T{k: 0} until we know whether
 T is a struct type. If it is, then k must be found among
 T's fields. If not, then k is an ordinary reference to a 
constant or variable in the lexical environment. 
Consequently, identifier resolution and type deduction
 are also inseparable in the general case.

Nonetheless, the three processes of identifier resolution, 
type deduction, and constant evaluation can be separated
 for the purpose of explanation.
elliott5 commented 6 years ago

This is a lovely new idea to wake up to on Christmas morning Jason, and fits well with the wonderful "wagon" present that Sebastien has been building for the world all year! You have both given me hope for the future on Christmas Day morning, what more can anyone ask!

The creation of an intermediate form, feeding multiple back-ends, is a very attractive idea. The form would need to allow the steady addition of new data definitions as the user adds each new statement to the REPL. As ever, tracking the changing content of that data to pass it on to the next iteration will be a key challenge to make portable between back-ends. Could this intermediate form, with data, be serialized to save state?

glycerine commented 6 years ago

Thanks for the encouragement, Elliott. Supporting re-definition of functions and methods would be a prime responsibility of the front end. So gofront would need to know function callers/callees at all times, so as to provide for minimal yet correct re-generation of transitive dependencies. Serialization even with shared pointers is fairy easy these days (e.g. with greenpack), so no reason not to enable that.

sbinet commented 6 years ago

I have only skimmed through (and still out of reliable internet connection). I believe a fair amount of ng can be reused for what you call gofront.

that's why I've worked on ng for the last month or so. there are still a few things missing in ng (defer+recover for e.g.) but things are coming together. my plan (and I apologize for not communicating it there or on the go-interpreter list) is to plug wagon in ng so when the gc toolchain gains GOARCH=wasm we'd get the full experience.

glycerine commented 6 years ago

@sbinet very exciting! I'm surprised and delighted to hear it. I didn't realized ng could be leveraged thus. I'll have to look into it more.

Aside: I hadn't hear about the gc work. Is someone working on the gc toolchain wasm target currently?

sbinet commented 6 years ago

ng is quite versatile and I have used it for my Go-HEP package together with its Jupyter front-end:

https://go-hep.org/news/release-0.8/

Also:

glycerine commented 6 years ago

@elliott5 @sbinet I've made a start. I renamed it from gofront to gi for "go interactive" or "go interpreter"; it is just less to type as well(!) The repo is here:

https://github.com/go-interpreter/gi

gi is based heavily on gopherjs and on the go/types library, and currently uses luajit as the backend.

It is currently able to handle these simple commands at the REPL (see the README.md (https://github.com/go-interpreter/gi/blob/master/README.md) for install instructions).

$ gi
gi> a := 10
10
gi> b := 12
12
gi> func addem(x, y int) int { return x + y }
gi> addem(a, b)
22
gi>

there will currently be lots of debug prints after each command, but focus on the last line which should be, eventually, all the user sees.

There is plenty to do, but if you would like to jump in and help, I think this can be polished off pretty quickly. Building on the shoulders of giants (gopherjs and luajit mostly)!

glycerine commented 6 years ago

I reoriented a little bit around the goal as well, realizing that I could not very well generalize without having at least one, if not two, examples of backends that were being driven by the front end. So the immediate goal is to get an integrated luajit backend up and going, and then have a go at creating an interface in the middle. Thus avoiding premature (and unworkable) abstraction. I'll create issues on the gi repo to track what I'm working on, and what's left todo.

glycerine commented 6 years ago

A quick status update: gi now does basic numerical for-loops, and keeps top-level statements in order. Because of the proper ordering (statements evaluated in the order they where entered at the REPL), closures now capture their variables correctly.

A quick sample session, with debug prints elided.

$ gi
==================
gi: go interactive (https://github.com/go-interpreter/gi)
==================
  [gi is an interactive Golang environment,
   also known as a REPL or Read-Eval-Print-Loop.]
  [type ctrl-d to exit]
  [type :help for help]
  [gi -h for flag help]
  [gi -q to start quietly]
==================
built: '2018-01-04T12:02:52-0600'
last-git-commit-hash: '83d98b5f3aa2f4b9f034938bfc92bc8b519e47c2'
nearest-git-tag: 'v0.0.3'
git-branch: 'master'
go-version: 'go_version_go1.9_darwin/amd64'
luajit-version: 'LuaJIT_2.1.0-beta3_--_Copyright_(C)_2005-2017_Mike_Pall._http://luajit.org/'
==================
gi> a := 5
gi> func looper() { for i := 0; i < a; i++ { println(i) }}
gi> looper()
0
1
2
3
4
gi> a = 2
gi> looper()
0
1
gi> 

Horray!

glycerine commented 6 years ago

Very light tests of method invocation pass in the latest 0.2.0 release of gi.

And we discovered the LuaR system for exchange of objects between Go and Lua. As usual the only snag is the limitation of reflect and not being able to create objects completely from scratch. Still I like the idea of using actual real go channels. We should just invent a map convention that represents typed Go structs, and use them everywhere internally; then the lack of reflect.NewType won't really matter.

The gofront idea actually evolved. Part of it was successful quite quickly; we achieved incremental type checking very quickly by adapting code from the standard library.

Since gi extends beyond the gofront proposal fairly substantially, I think I'll close out this issue and start updating the https://github.com/go-interpreter/gi/blob/master/README.md with news instead, so as not to spam folks.