Implement source parser

TheMatten commented 3 years ago

Once #4 is finished, we want to parse it - let's track related problems here:

Moving relevant question from #1 here - which library do we want to use for parsing? megaparsec and alex+happy sound like a good fit for our purposes - personally I may be more inclined to the former, though both options have their pros and cons.
What about error reporting? I guess we want to be consistent across the compiler, so we'll probably want custom error printer instead of e.g. one in megaparsec, which is specifically tied to ParsecT state. This may actually be question worth a separate issue - I'm interested in style similar to rustc:
```
error[E0391]: cycle detected when computing the supertraits of `main::FirstTrait`
--> src/main.rs:3:20
|
3 | trait FirstTrait : SecondTrait {
|                    ^^^^^^^^^^^
|
note: ...which requires computing the supertraits of `main::SecondTrait`...
--> src/main.rs:7:21
|
7 | trait SecondTrait : FirstTrait {
|                     ^^^^^^^^^^
= note: ...which again requires computing the supertraits of `main::FirstTrait`, completing the cycle
note: cycle used when collecting item types in top-level module
--> src/main.rs:3:1
|
3 | trait FirstTrait : SecondTrait {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error
```
Related to this, we want to somehow print the relevant source in later phases after parsing. Instead of pretty printing AST back, I would propose storing parsed lines of source as they are and storing line indexes along the position in line in spans in AST - we would then print the line itself (maybe truncated if needed), underlined in place of actual error span.

Boarders commented 3 years ago

I'm quite keen generally on separating lexing from parsing and so wondered if we could do alex + megaparsec. The main difficulty is I'm not sure how you lex Haskell style layout in a pleasing fashion (mostly because I have never tried it, I don't think there is a fundamental limitation). Maybe looking at how GHC does it would help.

TheMatten commented 3 years ago

I'm quite keen generally on separating lexing from parsing and so wondered if we could do alex + megaparsec.

When it comes to lexing, I'm worried about whitespace sensitivity - it doesn't really appear in Haskell98, but does a lot with extensions. It's not a fundamental limitation, but it will make things less nice.

The main difficulty is I'm not sure how you lex Haskell style layout in a pleasing fashion (mostly because I have never tried it, I don't think there is a fundamental limitation)

Looking at GHC's parser to see how they handle indentation vs braces, they seem to manually recognize both cases. https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser.y#L95 sounds horrifying BTW :sweat_smile: Wouldn't we avoid equivalent of shift/reduce conflicts when using combinators? (You just have to replace recursion with fold on list from time to time)

TheMatten commented 3 years ago

I wonder what to do about operator parsing - would there be any issues with parsing source that uses them as list of tokens and using separate phase to turn those into trees? It sounds more efficient and elegant than reordering left-associative tree as done in GHC.

solomon-b commented 3 years ago

What is the advantage of doing a separate lexing step if using parser combinators?

TheMatten commented 3 years ago

Possibly less space for mistakes and no need to handle whitespace in parser I guess - but Haskell is space-sensitive, so I wonder whether it won't be more convenient to do together.

solomon-b commented 3 years ago

Is there anything I can do to get the ball rolling on the parser?

TheMatten commented 3 years ago

Feel free to start - I'm slightly busy ATM, so I didn't really start anything myself. BTW, I'm sort of inclined to try multiple approaches (megaparsec/megaparsec+alex/happy+alex), so I guess you can just pick one and we can later sort it out. Nice thing about doing toy project is that there's no reason to hurry :smile:

TheMatten / hask

Implement source parser #5