Investgate Piggybacking off the Haskell Compiler Infrastructure (or other libraries)

CMCDragonkai commented 6 years ago

Since we're not writing a full language, we would like to piggy back Haskell's compilation infrastructure as much as possible.

The good thing is that Haskell's GHC is really modular. So we can just reuse much of it.

For example there's the GHC as a library: https://hackage.haskell.org/package/ghc

And there's the hint library that wraps around the GHC library into an easier to consume API: https://hackage.haskell.org/package/hint

More details on this here: http://downloads.haskell.org/~ghc/latest/docs/html/users_guide/extending_ghc.html

Discussed with @olligobber

CMCDragonkai commented 6 years ago

GHC currently has type checker plugins capability which allows people to extend GHC with their own type checking code to type check new things:

http://christiaanb.github.io/posts/type-checker-plugin/

This may or may not be useful, since it's plugging into the GHC ecosystem to typecheck Haskell code (plus new code that you write).

Whereas we will have a target language we want to type check that is an external DSL. If we are converting our external DSL into Haskell ADTs, plugins like these would type check the Haskell code that we write for the compiler/interpreter instead of the target language.

An example of type checking would be like the units of measure: https://github.com/adamgundry/uom-plugin/

The examples show Haskell source file that has special types added in using the quasiquoter. The quasiquoter seems like it is still necessary to make sure that the syntax is still correctly parsed as this is a type checker plugin, not a syntax plugin.

CMCDragonkai commented 6 years ago

There's a plugin system here: https://github.com/stepcut/plugins described by: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.7627&rep=rep1&type=pdf

It appears to show how you can load Haskell code at runtime.

The introduction to the paper shows an interesting idea. What if we did write our type system rules as an extension/plugin to the GHC system, and simply translate our Architect expressions directly into standard Haskell code.

The only problem would be error reporting. How to make sure that makes sense to the end user who is interacting at the Architect level, and not at the Haskell level.

CMCDragonkai commented 6 years ago

http://blog.reverberate.org/2013/09/ll-and-lr-in-context-why-parsing-tools.html

CMCDragonkai commented 6 years ago

I'm working of the hnix compiler. https://github.com/haskell-nix/hnix

Since nix was a DSL for just package building, but also pure lazy and functional. I figured it's a good place to start.

After studying it for a couple weeks here are my comments.

I've only looked at its parser and AST types
It's using quite advanced Haskell machinery
Most importantly it's AST is NExprF and NExpr which makes use of type-level fix data structures
The reason to do this is to make recursive data structures explicitly recursive (that is they must take themselves again, rather than just encapsulating the recursion)
Doing this has advantages in that it makes creating recursive operations on the AST more generic
For more theory on this technique see: http://mainisusuallyafunction.blogspot.com.au/2010/12/type-level-fix-and-generic-folds.html and http://blog.sumtypeofway.com/an-introduction-to-recursion-schemes/ and http://newartisans.com/2018/04/win-for-recursion-schemes/
One downside to the approach, is that it adds a lot more syntax noise and raises the abstraction level. The syntax noise is mostly due to having a "functor" NExprF and then the Fix applied to data structures. And when unpacking things, you always need to write out those constructs.
Another downside is that type class derivations don't really work anymore (for example show no longer works on the NExpr without some extra work). So you end up needing to derive slightly different classes. But this is also complicated, so we end up using things like deriving-compat packages and template haskell to perform derivation macros like: $(deriveShow1 ''...). Google "deriving typeclass show for Fix types haskell"'. While I found the correct libraries to achieve this, I'm still not sure why this functionality is in deriving-compat.
The AST isn't just defined, the hnix library ended up abstracting on top of NExprF in order to create an "annotated" AST. This annotated version carries source locations in the AST. To do this, they make use of the Compose functor. This adds even more syntax overhead to alot of the functionality, and in order to reduce it, they use bidirectional patterns (which does not have a lot of documentation at the moment). I don't like using these things when there's no documentation, so in my adaptation, I've removed all of this magic.
One issue I've been wrestling with how to interact with overloaded operators. I was hoping to be able to easily build in overloaded operators into the language, but this ends up being quite difficult with megaparsec. The makeExprParser takes just a static list of operators to create an expression parser. But discussing this on Megaparsec issue list, it may be possible later in the future to turn the operator table into state that can be mutated as the parser is parsing, and then mutate the parser when it encounters new operators. However this is also complicated by the fact that the operator table only contains operators that is binary, unary, prefix... etc. Whereas in general Haskell operators behave just like normal functions. Their infix powers only kick in when they are used without () parantheses. The operators can be unary, binary, trinary.. etc. You can easily define operators with multiple arguments. They just return a function after you apply 2 things to them. Since we are just working at a DSL for orchestration, this is not the most important. So I've left this out, and instead I only have 1 operator, which is whitespace and it is used for function application.
It appears that makeExprParser is mainly used to put together the operators and the terms. If we only have function application, then all of our terms are either functions of some sort or some primitive construct.
Syntax wise, the most complex syntax appears to be things interacting with attribute sets. I'm not sure whether to port that over.
In a lot of ways, I feel that there's alot of noise syntax and complexity wise in the hnix codebase, and in my adaptation, I want to simplify things more so its clearer.
Top level expression in Nix refers to what starts in a file. Architect expressions are also in files, but addressability starts at any binding. So our top level is similar to haskell, an implicit "attribute set" with top level bindings. Whereas nix allows functions and other stuff to be the top level expression for a file. This is not important to us, since imports don't work directly based on files, but instead based on any expression.
There's a recursion between the top level parser and the makeExprParser term, where the term will refer back to the top level parser in many cases.

CMCDragonkai commented 6 years ago

Weird syntax I discovered by investigating the hnix source code (nix syntax that I didn't know was legal):

{ a = 3; }.b or 4
# 4

({ a = { a = 3; }; }).a         .a
# 3

{ a = { a = 3;}; }.a or { a = 4; }    .a
# { a = 3; }

CMCDragonkai commented 6 years ago

The book Practical Foundations of Programming Langauges first starts off with simple languages like T or M. Eventually it reaches chapter 19 which it introduces language PCF.

PCF stands for "Programming Computable Functions"

https://en.wikipedia.org/wiki/Programming_Computable_Functions

Every other advanced features like exceptions, continuations... etc are then built as "extensions" to PCF.

Here's a worked through example taking a PCF reified as a haskell data type and building a compiler that targets C. https://jozefg.bitbucket.io/posts/2015-03-24-pcf.html

It uses a cool little library called c-dsl that builds on top of language-c http://hackage.haskell.org/package/c-dsl and http://hackage.haskell.org/package/language-c for generation of the C code.

Architect language isn't generating C any time soon, but it's interesting that even the Nix language ultimately shares its origin with these extensions on top of lambda calculus.

CMCDragonkai commented 6 years ago

I'm currently only working on the frontend of the language (parser and AST). Since this is right now being based on Nix, the language is untyped, but I'm going to try to work types into it, while also removing some Nix stuff that isn't relevant.

CMCDragonkai commented 6 years ago

Some potential resources for exploring implementing type checkers (with polymorphism):

I want to explore PCF since it's the most basic implementation of a typed lambda calculus that has some extra features. And Nix doesn't have types, so while it's a good way of deriving our requirements for laziness and parsing and functional semantics, there's no type theory involved! So PCF is a good starting point.

Another megaparsec article:

https://hackernoon.com/megaparsec-same-syntax-more-features-b8f9b844bb02

CMCDragonkai commented 5 years ago

The recursive nature of attribute sets or let-rec declarations combined with the defaulting syntax allows you do some interesting things here:

let f = { a ? 1, b ? a }: a + b; in f {}
# 2

MatrixAI / Architect

Investgate Piggybacking off the Haskell Compiler Infrastructure (or other libraries) #14