elm-tooling / elm-language-server-haskell

Elm language server written in haskell (archived). Use https://github.com/elm-tooling/elm-language-server instead.
BSD 3-Clause "New" or "Revised" License
52 stars 5 forks source link

Using the end users installed version of Elm for diagnostics? #14

Closed jaredramirez closed 5 years ago

jaredramirez commented 5 years ago

Hey, first off I'm psyched that this project is being worked on! I had a question (or comment) about how this project decided to report diagnostics. If my understanding is correct, it looks like this project uses a library version fork of the Elm compiler and uses that to type check the user's code. This is pretty cool, but I'm not sure that it's the best way to do this. My reasons for thinking this are

1) It seems like a lot of work to maintain a fork of of the Elm compiler. This is especially true if there ends up being something like an 0.19.1 to address a bug that is being discussed here. Additionally, it seems hard to eventually support both 0.19 and 0.20 down the line if this project uses a compiler fork.

2) Elm includes a flag to get these syntax and type errors. If one runs elm make src/Main.elm, and the error something like

-- UNBOUND TYPE VARIABLE ----------------------------------------- src/Field.elm

The `Field` type uses an unbound type variable `error` in its definition:

44| type Field value
45|     = Field value Metadata (Status error)
                                       ^^^^^
You probably need to change the declaration to something like this:

    type Field value error = ...

Why? Well, imagine one `Field` where `error` is an Int and another where it is a
Bool. When we explicitly list the type variables, the type checker can see that
they are actually different types.

then one can run elm make src/Main.elm --report=json and the output will be

{
    "type": "compile-errors",
    "errors": [{
        "path": "src/Field.elm",
        "name": "Field",
        "problems": [{
            "title": "UNBOUND TYPE VARIABLE",
            "region": {
                "start": {
                    "line": 44,
                    "column": 1
                },
                "end": {
                    "line": 45,
                    "column": 42
                }
            },
            "message": ["The `Field` type uses an unbound type variable ", {
                "bold": false,
                "underline": false,
                "color": "yellow",
                "string": "`error`"
            }, " in its definition:\n\n44| type Field value\n45|     = Field value Metadata (Status error)\n                                       ", {
                "bold": false,
                "underline": false,
                "color": "red",
                "string": "^^^^^"
            }, "\nYou probably need to change the declaration to something like this:\n\n    type Field value ", {
                "bold": false,
                "underline": false,
                "color": "GREEN",
                "string": "error"
            }, " = ...\n\nWhy? Well, imagine one `Field` where `error` is an Int and another where it is a\nBool. When we explicitly list the type variables, the type checker can see that\nthey are actually different types."]
        }]
    }]
}

Then, one only has to parse the json to get the error and report it back to the user.

Because of the challenges of maintaining a fork and the fact the you can get compile errors from elm make, it seems to me that attempting to find the user installed version of Elm and getting diagnostics from that would be more effective. Other benefits of this approach are

1) The language server could prefer local installations. If a user used 0.19 in one project and 0.20 installed in another (both installed in maybenode_modules), then the language server could report errors seamlessly in both. Additionally, it can always fallback to looking at a global installation.

2) This project only has to maintain a fork of the elm parser, which is simpler than maintaining a fork of the compiler. I think it's fair to say that generally speaking the elm syntax doesn't change drastically between releases, so maintain a fork of just the compiler may end up being more straightforward.

I would also like to point out that this seems to be the approach taken by [elm-format] (https://github.com/avh4/elm-format), though their goal/use case is somewhat different.

And lastly, I have been working on my own implementation of an elm language server, and the approach I've outlined in this issue has worked well so far. It seems that collaborating on one project is more beneficial to the community though, which is why I'm writing this issue. If interested, this project can use that as a basis for the implementation.

Please let me know your thoughts on this, it's super possible that there are reasons for keeping a fork of the compiler that I haven't thought of! And I'm super down to work on this refactor if it's decided that this is the way to go.

matheus23 commented 5 years ago

Hi! First, thank you for this great writeup! :)

The Elm Compiler Fork

It seems like a lot of work to maintain a fork of of the Elm compiler.

Yes, that might be true in the future, I don't know that yet. However, our changes to the compiler are minimal as of now:

  1. Most importantly, we changed the build configuration (the .cabal file) to make it export all modules of the compiler as a library, so we can depend on it in this project
  2. We exported some functions so we don't have to repeat them in the language server codebase
  3. We added a function that doesn't kill the process after running an action, but instead returns the error code (tryWithError).
  4. We abstracted one particular piece of a function (compileModule). But since this was more of a change compared to points 1 and 2, this change was just copied into the language server codebase.

So our fork basically doesn't have any actual code changes, except for tryWithError 1. We do have forked code (3 functions) from the compiler in the language server itself.2

I think that it might get more complicated to adapt to compiler code changes in the future, because the more features the language server has, the more we'll need to interface with the compiler. But I think you overestimate how much work this actually is.

Handling different Elm Versions

Additionally, it seems hard to eventually support both 0.19 and 0.20 down the line if this project uses a compiler fork

it seems to me that attempting to find the user installed version of Elm and getting diagnostics from that would be more effective

I think that ideally each elm compiler ships with a language server. It certainly won't be this implementation, but its goal was primarily to assess whether something like that would even be feasible and how beneficial sharing parts with the compiler is. I personally view this as more of an experiment than the official implementation. If you squint your eyes a little, you can see how the json-report interface is something similar to a language server already today.

Collaboration

It seems that collaborating on one project is more beneficial to the community though, which is why I'm writing this issue. If interested, this project can use that as a basis for the implementation.

As of now, to me it seems like we're in experimentation phase and just try out different approaches. Merging seems to me like premature abstraction :D

A little bit of a Sidenote: I think its important to understand the motivations for this project to understand my stance: There are many things that the elm compiler can already do, that would be extremely useful for an IDE:

  1. Name resolution (Canonicalization): For a valid project, the compiler can resolve any identifier to the place it was defined. That is: the package and module or even file on disk it is currently saved at. Imagine how easy jump-to-definition can be implemented here. Or imagine scraping all exposed identifiers of all dependencies and suggesting imports, like everyone writing java is used to! The compiler already has datastructures describing all elm files (this is what elmi-to-json uses, except that it reads the serialized interface files from disk instead of from memory).
  2. Type inference: The compiler already infers types for everything. Expressions, let definitions or top-level function definitions. It also has a (limited) pretty printer for types (used for error messages) that could be used to auto-generate type annotations!
  3. Typo suggestions: The compiler already writes suggestions into the output: Hint: Seems like a record field typo. Maybe pge should be page? There is a datatype in the elm compiler that looks like this:
    data Report =
      Report
        { _title :: String
        , _region :: R.Region
        , _sgstns :: [String]
        , _message :: D.Doc
        }

    The _sgstns (suggestions) field is exactly the information we need! We hacked on this feature at Munihac this year and got it working. It currently lives in the code-actions branch, waiting to get cleaned up a little.

  4. Syntax/Type errors (diagnostics): The compiler already checks for those. Why not use this functionality?

By now the question "Why don't we implement these features using the json-report interface?" should be screaming in the back of one's mind. Obviously some of those features don't have the necessary interface yet, but might have in the future!

I think that, if one were to implement all those features into the json-report interface, we would end up with something that is close to as complete and difficult to maintain for different elm versions, as implementing the language server protocol for the respective features, or even something that is basically a re-implementation of it.

So I went on quite a tangent with this answer, but all in all: I didn't use the json interface for diagnostics, because I wanted to test using the datastructures directly and wanted to use even more datastructures directly in the future.

Footnote 1: You can take a look at the diff between our fork and the actual compiler code here: https://github.com/elm/compiler/compare/elm:a936e95...elm-tooling:5354f1f

Footnote 2: These functions live in the Language.Elm Module.

jaredramirez commented 5 years ago

Hey, thanks for the super detailed response!

You pretty much addresses all of my questions/concerns, and then some! I appreciate you taking the time to outline your thought process and motivations. I completely understand why your taking this approach. I'm super looking forward to seeing this project develop!

Thanks again for your time!