carymrobbins / intellij-haskforce

Haskell plugin for IntelliJ IDEA
http://carymrobbins.github.io/intellij-haskforce/
Apache License 2.0
486 stars 39 forks source link

Haskell Parser #233

Open carymrobbins opened 8 years ago

carymrobbins commented 8 years ago

As it currently stands, there are 17 open issues with the current Haskell parser, and counting. Here are my thoughts on the resolution -

This will be a big undertaking, but once complete I think we'll see a much better product. Once this is done, I plan on releasing it as 0.4-rc1. I will start work on this after completing #169.

carymrobbins commented 8 years ago

I was thinking about how haskell-src-exts might be a possible alternative to implementing the parser ourselves. Note that we had attempted this before by using an external tool, parser-helper, which serialized haskell-src-exts to JSON. However, there are a few problems I see with this.

  1. IntelliJ uses the lexer/parser internally, so it may be difficult (and probably even a hack) to try to use an external parser and then somehow instruct IntelliJ how to connect the text source locations with the nodes returned from the external parser.
  2. We would need to create an RPC server (or the like) which consumes text, parses using haskell-src-exts, serializes to binary, then returns the binary encoded data. We'd then need to represent all of the data types, probably as Java objects (which don't have the correctness of Scala types), in the plugin which could then be deserialized. This may or may not be less performant than a proper parser implemented directly in IntelliJ.

From what I recall, the problem with parser-helper was that it lacked in performance. This might have been improved by implementing it as an RPC server and using a binary serialization protocol instead of JSON (point 2), but also the HaskellParser2 class had to some how had to connect our internal lexer with haskell-src-exts' AST (point 1). This seems pretty complicated and error prone.

While I prefer being able to build off of existing work (haskell-src-exts) it seems that in this case implementing the parser ourselves may still be the best option.

carymrobbins commented 8 years ago

Here's a decent overview of the IntelliJ Psi Parsing API -

http://www.jetbrains.org/intellij/sdk/docs/reference_guide/custom_language_support/implementing_parser_and_psi.html

carymrobbins commented 7 years ago

In talking with @rahulmutt, it may be possible in the near future to compile alex with Eta. This might make it possible to implement an IntelliJ lexer based on GHC's implementation.

If that proves to be workable, then writing an IntelliJ PsiParser by hand on top of that lexer would probably be ideal, particularly since most parser bugs appears to be problems with the layout the lexer produces. Here's a recent one https://github.com/carymrobbins/intellij-haskforce/issues/333 and its fix https://github.com/carymrobbins/intellij-haskforce/pull/334.

A big reason why we should prefer to write the parser by hand is that it must be recovering. GHC's parser, however, does not recover, so this would prevent any sort of source analysis for an invalid source file. This is terrible when developing code and all modern IDEs deal with this by implementing a recovering parser (just try out Java or Scala support in IntelliJ).

carymrobbins commented 7 years ago

It's probably a good idea to review the layout rules as described in the Haskell 2010 report

https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3

carymrobbins commented 7 years ago

Upon playing with haskell-src-exts' lexer, it seems it doesn't actually produce the layout (i.e. there are no indent/dedent tokens), which leads me to believe that the layout is entirely handled by the parser using the source positions reported by the lexer. Given the details and rules in the Haskell 2010 report, this seems like a potentially reasonable solution for us as well. However, this means that integrating HSE's lexer, while possibly being useful, won't help us solve problems with the layout. That will have to be solved by hand or by patching an existing parser to support recovery.

λ :m + Language.Haskell.Exts.Lexer
λ lexTokenStream "{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}\nmodule Main where\nmain = do\n  putStrLn $(foo)\n  where\n  foo = \"bam\""
ParseOk
  [ Loc { loc = SrcSpan "<unknown>.hs" 1 1 1 13 , unLoc = LANGUAGE }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 1 14 1 29
      , unLoc = ConId "TemplateHaskell"
      }
  , Loc { loc = SrcSpan "<unknown>.hs" 1 29 1 30 , unLoc = Comma }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 1 31 1 42
      , unLoc = ConId "QuasiQuotes"
      }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 1 43 1 46 , unLoc = PragmaEnd }
  , Loc { loc = SrcSpan "<unknown>.hs" 2 1 2 7 , unLoc = KW_Module }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 2 8 2 12 , unLoc = ConId "Main" }
  , Loc { loc = SrcSpan "<unknown>.hs" 2 13 2 18 , unLoc = KW_Where }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 3 1 3 5 , unLoc = VarId "main" }
  , Loc { loc = SrcSpan "<unknown>.hs" 3 6 3 7 , unLoc = Equals }
  , Loc { loc = SrcSpan "<unknown>.hs" 3 8 3 10 , unLoc = KW_Do }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 4 3 4 11
      , unLoc = VarId "putStrLn"
      }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 4 12 4 13 , unLoc = VarSym "$" }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 4 13 4 14 , unLoc = LeftParen }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 4 14 4 17 , unLoc = VarId "foo" }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 4 17 4 18 , unLoc = RightParen }
  , Loc { loc = SrcSpan "<unknown>.hs" 5 3 5 8 , unLoc = KW_Where }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 6 3 6 6 , unLoc = VarId "foo" }
  , Loc { loc = SrcSpan "<unknown>.hs" 6 7 6 8 , unLoc = Equals }
  , Loc
      { loc = SrcSpan "<unknown>.hs" 6 9 6 14
      , unLoc = StringTok ( "bam" , "bam" )
      }
  ]