Open carymrobbins opened 8 years ago
I was thinking about how haskell-src-exts might be a possible alternative to implementing the parser ourselves. Note that we had attempted this before by using an external tool, parser-helper, which serialized haskell-src-exts to JSON. However, there are a few problems I see with this.
From what I recall, the problem with parser-helper was that it lacked in performance. This might have been improved by implementing it as an RPC server and using a binary serialization protocol instead of JSON (point 2), but also the HaskellParser2 class had to some how had to connect our internal lexer with haskell-src-exts' AST (point 1). This seems pretty complicated and error prone.
While I prefer being able to build off of existing work (haskell-src-exts) it seems that in this case implementing the parser ourselves may still be the best option.
Here's a decent overview of the IntelliJ Psi Parsing API -
In talking with @rahulmutt, it may be possible in the near future to compile alex with Eta. This might make it possible to implement an IntelliJ lexer based on GHC's implementation.
If that proves to be workable, then writing an IntelliJ PsiParser
by hand on top of that lexer would probably be ideal, particularly since most parser bugs appears to be problems with the layout the lexer produces. Here's a recent one https://github.com/carymrobbins/intellij-haskforce/issues/333 and its fix https://github.com/carymrobbins/intellij-haskforce/pull/334.
A big reason why we should prefer to write the parser by hand is that it must be recovering. GHC's parser, however, does not recover, so this would prevent any sort of source analysis for an invalid source file. This is terrible when developing code and all modern IDEs deal with this by implementing a recovering parser (just try out Java or Scala support in IntelliJ).
It's probably a good idea to review the layout rules as described in the Haskell 2010 report
https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3
Upon playing with haskell-src-exts' lexer, it seems it doesn't actually produce the layout (i.e. there are no indent/dedent tokens), which leads me to believe that the layout is entirely handled by the parser using the source positions reported by the lexer. Given the details and rules in the Haskell 2010 report, this seems like a potentially reasonable solution for us as well. However, this means that integrating HSE's lexer, while possibly being useful, won't help us solve problems with the layout. That will have to be solved by hand or by patching an existing parser to support recovery.
λ :m + Language.Haskell.Exts.Lexer
λ lexTokenStream "{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}\nmodule Main where\nmain = do\n putStrLn $(foo)\n where\n foo = \"bam\""
ParseOk
[ Loc { loc = SrcSpan "<unknown>.hs" 1 1 1 13 , unLoc = LANGUAGE }
, Loc
{ loc = SrcSpan "<unknown>.hs" 1 14 1 29
, unLoc = ConId "TemplateHaskell"
}
, Loc { loc = SrcSpan "<unknown>.hs" 1 29 1 30 , unLoc = Comma }
, Loc
{ loc = SrcSpan "<unknown>.hs" 1 31 1 42
, unLoc = ConId "QuasiQuotes"
}
, Loc
{ loc = SrcSpan "<unknown>.hs" 1 43 1 46 , unLoc = PragmaEnd }
, Loc { loc = SrcSpan "<unknown>.hs" 2 1 2 7 , unLoc = KW_Module }
, Loc
{ loc = SrcSpan "<unknown>.hs" 2 8 2 12 , unLoc = ConId "Main" }
, Loc { loc = SrcSpan "<unknown>.hs" 2 13 2 18 , unLoc = KW_Where }
, Loc
{ loc = SrcSpan "<unknown>.hs" 3 1 3 5 , unLoc = VarId "main" }
, Loc { loc = SrcSpan "<unknown>.hs" 3 6 3 7 , unLoc = Equals }
, Loc { loc = SrcSpan "<unknown>.hs" 3 8 3 10 , unLoc = KW_Do }
, Loc
{ loc = SrcSpan "<unknown>.hs" 4 3 4 11
, unLoc = VarId "putStrLn"
}
, Loc
{ loc = SrcSpan "<unknown>.hs" 4 12 4 13 , unLoc = VarSym "$" }
, Loc
{ loc = SrcSpan "<unknown>.hs" 4 13 4 14 , unLoc = LeftParen }
, Loc
{ loc = SrcSpan "<unknown>.hs" 4 14 4 17 , unLoc = VarId "foo" }
, Loc
{ loc = SrcSpan "<unknown>.hs" 4 17 4 18 , unLoc = RightParen }
, Loc { loc = SrcSpan "<unknown>.hs" 5 3 5 8 , unLoc = KW_Where }
, Loc
{ loc = SrcSpan "<unknown>.hs" 6 3 6 6 , unLoc = VarId "foo" }
, Loc { loc = SrcSpan "<unknown>.hs" 6 7 6 8 , unLoc = Equals }
, Loc
{ loc = SrcSpan "<unknown>.hs" 6 9 6 14
, unLoc = StringTok ( "bam" , "bam" )
}
]
As it currently stands, there are 17 open issues with the current Haskell parser, and counting. Here are my thoughts on the resolution -
This will be a big undertaking, but once complete I think we'll see a much better product. Once this is done, I plan on releasing it as 0.4-rc1. I will start work on this after completing #169.