Roadmap - Githubissues

TheMatten commented 3 years ago

Let's put together some roadmap!

As @ssbothwell mentioned, we could start with core IR - it doesn't need any inference and should be much simpler language, while supporting all the relevant type features. I'd like to parse it, so that we can construct and test snippets easily (we could even go as far as to write quasiquoter that lets us interpolate core snippets with custom values, but that isn't really relevant right now :smile:).

What about something like:

Core

[ ] AST
[ ] typecheck
[ ] parse
[ ] testing codegen (e.g. HS to send to GHC, just so we can run something)
[ ] ???

Hask

[x] AST
[ ] parse
[ ] typecheck (we should probably make issue to discuss it's specific roadmap)
[ ] desugar to Core

after that, we would need to decide on rest of the pipeline - do we want STG and LLVM, or we want to put some intermediate stage in between, or we want to simply interpret things for now and have STG and bytecode ran by custom runtime? There's GRIN, which is literally backend for (lazy) FP languages, so maybe we could just use that one after Core, it just doesn't seem to have very extensive documentation right now, so it may need some tinkering.

Some random thoughts:

instead of Alex/Happy used by HS/PS, I would like to use parser combinator library - megaparsec has support for pretty errors, operators and indentation, which could save us a lot of time (I think Idris 2 uses it?)
what about extensions and following spec? I think BlockArguments should be easy with parser combinators, and ExplicitForall literally makes us guess less? I guess we may want to take a look at "Simplify subsumption" proposal - I'm not sure if it's relevant to Rank1 in any way, but maybe it could guide us in some (future?) decisions

solomon-b commented 3 years ago

This is great, thanks for getting the ball rolling! Targeting GRIN initially is a great idea. It would let us focus on Core and Hask and then return to backends at a later date.

I used Megaparsec for my interpreted language HowardLang and yes it gives great errors. I've never used Alex/Happy, and I do like the idea being able to define a valid BNF grammar and just get a parser for free, but I'm down to use Parser Combinators. As a teaching tool it would be nice to use parser combinators and produce an annotated example parser for a complex language like Haskell.

I would be fine with going off spec a bit so long as its behind Language Pragmas so that we are maximally compatible with existing code.

It should be pretty trivial to write an interpreter for Core as well for testing purposes.

solomon-b commented 3 years ago

Do we want to talk about program design yet at this point? I find Recursion Schemes to be a really fantastic tool for doing complex manipulations of trees.

@TheMatten I know you did a lot of work on Polysemy, since we aren't worried about performance are there other high level abstractions you think would be a natural fit for a compiler?

TheMatten commented 3 years ago

Sure :+1:

Yeah, they're nice - other options to consider would be hand-rolled polymoprhic traversals (used by PureScript) and Data/syb (used by GHC). I guess I would be fine with either of those - as long as the result is both readable and nice to work with.

When it comes to effects, I'm sorta inclined to not go fancy and create concrete stacks for different phases - they should usually have pretty well defined scope and polymorphic return types are super inconvenient when you have bunch of local definitons without signatures that use random effects - and from my personal experience they appear a lot in this sort of code. Of course, that doesn't mean hard-coding effects into concrete monads, just wrapping concrete stacks in some newtype and using those instead of (..) => m .. style.

As far as polysemy goes, of course I would like to use it, but at the same time I'm inclined to keeping deps and familiarity overhead low using mtl if we want to make this easy to understand :slightly_smiling_face:

I was thinking a little bit about other infrastructure, and I think we could:

split work into multiple in-repo packages for source, typechecker, core and codegen, so that we can plug any of those into other stuff or investigate multiple approaches easily
cleanly separate stuff like unique identifiers generation, error construction and printing or debugging utilities (personally I would like to provide mechanism for dumping useful information from errors and asserts)

TheMatten commented 3 years ago

BTW, maybe optics could potentially work well for substitution/inspection-like things?

TOTBWF commented 3 years ago

Hey all! I've done some thinking about this in the past, and I think it might make sense to break backward compat in a few ways that dramatically simplify the compiler implementation/make the UX a lot nicer:

Enable ExplicitForAll by default, and require them
Enable ScopedTypeVariables by default.

On the implementation side, I think we should avoid Trees That Grow at all costs. It makes things so much more annoying to read/deal with.

solomon-b commented 3 years ago

I'm open to breaking spec, we just have to be okay with losing compatibility with all of Hackage right away instead of losing most of Hackage with the possibility of regaining it via catching up with GHC (lol).

TheMatten commented 3 years ago

Sounds good! We could have Haskell98 though to enforce compatibility when wanted.

When it comes to ScopedTypeVariables, I sort of got a feeling from following GHC dev that they can be tricky to get right. Is that feeling right, or should I not be worried in absence of more advanced extensions?

solomon-b commented 3 years ago

Also if we are totally cool with going our own direction we can ask ourselves the question "is there anything we can do now--without all the baggage of GHC--to enable use to more easily add real dependent types later?"

TOTBWF commented 3 years ago

IIUC the subtlety starts to sneak in when you have to think about how let-generalization behaves. With that in mind, MonoLocalBinds might also be good to have on by default.

solomon-b commented 3 years ago

What if we made Core dependently typed?

edit: oh wait i totally forgot about laziness.

TheMatten commented 3 years ago

I guess laziness would actually be useful here - forcing type computations just to get their head would be wildly inefficient. Instead, I guess the bigger problem would be the fact that you either need to be able to plug existing interpreter into typechecker, or you have to implement a second one just to evaluate types.

TheMatten commented 3 years ago

@TOTBWF

On the implementation side, I think we should avoid Trees That Grow at all costs. It makes things so much more annoying to read/deal with.

Good point - if we want to have modular AST across phases, we need to find some simple solution (I find TTG to be ugly too) Other than having distinct ASTs or random pieces of ADTs put together, one option is employing pattern synonyms over polymorphic type, other is to unsafeCoerce dummy constructors in Sandy's style :smile:

TheMatten commented 3 years ago

(I'll create issue for deciding on initial set of language extensions to not clutter this one.)

TheMatten commented 3 years ago

Another thing that's bugging my mind when it comes to ASTs - what about annotations like source spans? They're pretty cumbersome to work with in GHC, but at the same time it's hard to think of actually superior solution.

Boarders commented 3 years ago

I would organise things roughly as follows to get the ball rolling:

Hask Frontend:

Lex and Parse: for this I would use Alex and Happy. I have done this before with small toy languages e.g. see here: https://github.com/Boarders/cedille-core/tree/master/Parser and am happy to help with getting started here. I think you shouldn't roll your own grammar with parser combinators as it is hard to get good source positions, good errors and very fickle in my experience. Personally, I think source spans are a good thing to have and there are ways to make them reasonable (e.g. have a look at for how the surface AST can include source spans: https://github.com/AndrasKovacs/elaboration-zoo/blob/5b4373200f0ce7acdefff5c81efd5e7a674d75eb/03-holes/Main.hs#L217)
Surface AST: The AST we get after parsing which may still contain source information and various implicit arguments.
Core AST: The transformation from Surface AST to Core AST is elaboration where we fill in meta variables for implicit arguments and then produce constraints during typechecking which we solve. Ideally here we do what GHC does here which is separate out collecting constraints from solving constraints which could have its own constraint language. After this we will have typechecked and produced a core expression. As @TOTBWF says I think we should not do let-generalization but instead demand a type signature be given for all polymorphic definitions (in practice that is not a heavy demand but if people disagree then I'm happy to do something else). That all fits nicely with a bidirectional typechecker.
As far as core goes there are lots of different options since we aren't necessarily restricted to System FC but I think it is a fine starting point.

I did have the thought that we could do a pass where we target the intermediate language from the Kinds are Calling Convention paper to produce an educational backend but we can see how things go. That would include having some sort of arity analysis. It would also be interesting to see if anything can be done with linearity.

TheMatten commented 3 years ago

@Boarders thanks, sounds reasonable :slightly_smiling_face:

Parser combinators may require more thought when it comes to recursive rules, but I found them to be more convenient when it comes to other aspects. That implementation of spans is great, thanks for pointing it out!
:+1:
So you think we should do typechecking only after getting to Core? And deferring constraint solving would mean emitting type equalities instead of having some sort of unification monad if I understand it right (sounds nice)? MonoLocalBinds sounds reasonable, maybe we could flag it and still provide Haskell98 implementation as mentioned above.
System FC sounds good - we don't need to get exotic, instead it would be nice to choose well explored solutions and let alternatives for further, separate exploration.

Yeah, there's lots of options - personally I would be interested in choosing conservative ones at first and build on top of them / replace them after a little bit of experience with the result.

TheMatten commented 3 years ago

(BTW, just to be clear, I think of Core AST in sense of GHC's Core, not sligthly more explicit version of source AST - that is, I would expect it to e.g. not contain spans)

TOTBWF commented 3 years ago

The other question that needs to be answered is how the compiler ought to operate. A batch mode compiler is simpler, but does make writing tooling much, much harder (see the past 30 odd years of GHC related tooling).

On the other hand a more incremental or query-based compiler is harder to write up front, but makes tooling and incremental compilation (duh!) easier.

solomon-b commented 3 years ago

@Boarders what alternatives to System FC are you thinking of?

TheMatten commented 3 years ago

@TOTBWF when it comes to the architecture, I've found this article to be interesting: https://ollef.github.io/blog/posts/query-based-compilers.html I imagine FRP would fit into this sort of pattern (like in https://mpickering.github.io/posts/2020-03-16-ghcide-reflex.html), though I'm not sure whether that fits "Tiny little Haskell impl" well :smile:

siraben commented 3 years ago

Is self-hosting a goal at all? It would be interesting if pulled off but understandable otherwise.

TheMatten commented 3 years ago

Probably not - it would be nice to use existing libraries instead of building our own infrastructure GHC-style, but we don't even plan fundeps yet, so that would mean no access to e.g. mtl and trying to build boot libraries which depend on GHC-specific things and FFI would probably be a nightmare.

siraben commented 3 years ago

I'd advocate for use of the bound library to handle naming, substitutions and dealing with bound variables in general.

TOTBWF commented 3 years ago

I would avoid using bound tbh. It can be super slow, and if you have a renaming pass most of the problems it solves disapear.

Boarders commented 3 years ago

I would also say to avoid bound and moreover that we should never have to perform substitution even if we want to have beta reduction in an optimizer (as you can use a semantic evaluator).

solomon-b commented 3 years ago

So no Debruijn Indices? Do we just replace all binders with fresh variables?

On Sat, Oct 24, 2020, 11:12 AM Callan McGill notifications@github.com wrote:

I would also say to avoid bound and moreover that we should never have to perform substitution even if we want to have beta reduction in an optimizer (as you can use a semantic evaluator).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TheMatten/hask/issues/1#issuecomment-716033470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAF3ZWNR57NTMMPEO2LBQTSMMKHDANCNFSM4S4FJALQ .

TheMatten commented 3 years ago

Sounds reasonable to me - we can take GHC's renamer for an inspiration: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/renamer

TheMatten commented 3 years ago

One thing when it comes to renaming - I would like to learn from GHC's mistakes (and PS's successes) and support qualification of syntax everywhere where possible - so that equivalent of RebindableSyntax or QualifiedDo would be easy to accomplish.

siraben commented 3 years ago

Ah, ok. So looks like we would use a fast version of the Barendregt convention described in https://www.microsoft.com/en-us/research/publication/secrets-of-the-glasgow-haskell-compiler-inliner/

TheMatten commented 3 years ago

@TOTBWF

On the implementation side, I think we should avoid Trees That Grow at all costs. It makes things so much more annoying to read/deal with.

Good point - if we want to have modular AST across phases, we need to find some simple solution (I find TTG to be ugly too) Other than having distinct ASTs or random pieces of ADTs put together, one option is employing pattern synonyms over polymorphic type, other is to unsafeCoerce dummy constructors in Sandy's style smile

What solution do we pick then? I'm sort of inclined to qualified modules with different versions of AST and transformations between them - it's both dumb simple and preserves exhaustivity checking without ugly tricks, and we can try to factor out common parts later where applicable.

solomon-b commented 3 years ago

What solution do we pick then? I'm sort of inclined to qualified modules with different versions of AST and transformations between them - it's both dumb simple and preserves exhaustivity checking without ugly tricks, and we can try to factor out common parts later where applicable.

I'm totally fine with this solution. Lets keep things simple until we cant!

TheMatten commented 3 years ago

@ssbothwell Ok :+1: I'll create draft PR soon for source AST that we can use as a "case study" to create guidelines across repo (spoiler - I want to use Haddocks for notes instead of random comments like in GHC)

TheMatten / hask

Roadmap #1