Error formatting improvements

mxxo commented 4 years ago

I want to try improving Futhark's command-line errors. I've been going through the nice work done in #634 and would like to try adding colour output and caret diagnostics to typechecking errors.

For example, the duplicate parameters (https://github.com/diku-dk/futhark/blob/master/tests/errors/duplicate-params.fut) error is

Error at .\tests\errors\duplicate-params.fut:6:20-20:
Name x also bound at .\tests\errors\duplicate-params.fut:6:11-11
If you find this error message confusing, uninformative, or wrong, please open an issue at https://github.com/diku-dk/futhark/issues.

which is understandable, but somewhat rudimentary.

Compared to the same error in Rust:

error[E0415]: identifier `x` is bound more than once in this parameter list
 --> src\main.rs:5:22
  |
5 | fn dup_param(x: i32, x: i32) {
  |                      ^ used as parameter more than once

error: aborting due to previous error

For more information about this error, try `rustc --explain E0415`.
error: Could not compile `errs`.

which is a bit of a unfair comparison (general-purpose language vs. desert hedgehog), but perhaps there's ideas we can take advantage from to make debugging easier.

Thank you for your time and all your work on Futhark.

athas commented 4 years ago

I certainly wouldn't mind something like this. I don't know exactly how other languages implement this, but I think Futhark already carries around most of the necessary parts. In particular, we do maintain pretty complete source location information (filename, line, and column span). I think if we also carry around a full in-memory copy of all the input files during type checking (which isn't all that much data), we could just do a lookup in those based on the source location.

flip111 commented 4 years ago

Great idea @mxxo !

mxxo commented 4 years ago

What kind of error format would you like to see? clang, rustc and recent gccs come to mind for good error messages, and there's some nice error formatting documents here:

I think the things we could improve are:

Use the SrcLoc information (and a copy of the input file like you mentioned) to print out where the error happens (with some caret helpers or --- underlining etc). Some of the printing logic is a little hairy, I don't know how far we want to go here.
Add some help notes if there is probable fix that might not be obvious to the programmer, like this one from the size types section in the book:

For example, we could modify

[1]> zip (dup [1,2,3]) (dup [3,2,1])
Error at [1]> :1:19-31:
Cannot apply "zip" to "(dup [3, 2, 1])" (invalid type).
Expected: [ret₇]b₂
Actual:   *[ret₁₂]i32

Dimensions "ret₇" and "ret₁₂" do not match.

Note: "ret₇" is unknown size returned by "dup" at 1:6-16.

Note: "ret₁₂" is unknown size returned by "dup" at 1:20-30.

with a help section like:

help:  you can try to coerce each array to a given size using the :> operator.
       e.g. for an output array of length 6: 
         zip (dup [1,2,3] :> [6]i32) (dup [3,2,1] :> [6]i32)

Similar for parse errors, etc:

let dup [n] (xs: [n]i32): [2*n]i32 =
  map (\i -> xs[i/2]) (0..<n*2)

could print

Error: only variables or constants are allowed in size types.
Help: try removing the size type from the return array.
      dup [n] (xs: [n]i32): []i32 =
                            ^^

instead of today's

Error at .\tests\errors\size-type.fut:1:29-29: Parse error.

(here it's a parse error instead of type checking error, but perhaps common ideas could be used there?)

I know this is much easier said than done, especially since Futhark is a moving target and would appreciate any feedback.

flip111 commented 4 years ago

Bikeshedding over how exactly the error would look like is one thing. For the implementation the difficulty is mostly about what information the compiler gives you to format into a next error message. Do you know which information you would like for the error message and which the compiler has available?

IMO just take any error message you like for the moment, format can be changed easily later anyway.

mxxo commented 4 years ago

Yes, that makes sense :). From what I understand, we've got

an error span location of type SrcLoc with error start and stop file, line and column info
Notes, a list of information strings about the error (where we could pass help info etc.)
Doc, the actual error message string. Sometimes it has some extra SrcLoc info.

Here is the error definition: https://github.com/diku-dk/futhark/blob/master/src/Language/Futhark/TypeChecker/Monad.hs#L95

-- | Information about an error during type checking.
data TypeError = TypeError SrcLoc Notes Doc

instance Pretty TypeError where
  ppr (TypeError loc notes msg) =
    text (inRed "Error") <+> "at" <+> text (locStr loc) <> ":" </>
    msg <> ppr notes

What's missing is the actual source lines of code (like @athas mentioned)

I think if we also carry around a full in-memory copy of all the input files during type checking (which isn't all that much data), we could just do a lookup in those based on the source location

to actually print out the erroneous lines. I am not super well versed in Haskell, maybe we could carry that info around in the TypeChecker Monad?

athas commented 4 years ago

I like Rust's style of error messages. Generally, during the development of Futhark I have stolen from Rust whenever there was some surface language detail I wasn't sure about, since they seem to have put a lot of thought into everything (e.g. the primitive type system and notation in Futhark is taken from Rust and LLVM).

Adding "hints" like the size type stuff is fine, I guess. We already have that in a few cases. They are represented as "notes". I'm a bit biased against adding too much of that kind of stuff, as it easily becomes very verbose and very specialised, without necessarily being useful (whereas printing source context is always useful).

Printing better parse errors is difficult, because Futhark's parser is written in a strict LALR framework (it uses the Happy parser generator), and these are notoriously difficult to add sensible failure cases for. I use such a parser generator because I think it is important that Futhark remains an LR grammar, and the tool forced me to keep it that way, but I don't actually think it's necessaily good idea to use a parser generator forever. Writing an entirely new parser is too much work in the short term, though.

For printing source context, we would indeed have to carry around the original source in the TypeM monad. That's not the tricky part. The tricky part is modifying the pipeline all the way from the parser to the typechecker to carry that information along with it. However, if we assume that we only ever want to print source context for the current file, it's pretty easy. We would modify the handleFile function to also pass the file_contents variable to the checkProg function (and checkProg would then be changed to put the source in a new field in the type checker environment). That's a very local change.

Qata commented 3 years ago

In terms of human friendliness, Elm is best in class for compiler errors.

This is a pretty old article (the errors have gotten even better since), but it's still good. https://elm-lang.org/news/compiler-errors-for-humans

trampolinecall commented 1 year ago

Hi there! First-time contributor here!

I could spend some time on this issue. I have a (admittedly very unpolished) prototype of a formatting algorithm that produces output pretty similar to rustc's, and I could spend some time trying to plug it in here.

If I'm understanding correctly, the things that need to be done include:

Modify the architecture to pass the source code through all the phases and into the typechecker
Use the source in the printing of error
Annotate type errors with secondary messages

What would be the best place to start with this?

trampolinecall commented 1 year ago

Small update: after poking around the source for a bit, I think it might be helpful to also create a unified type Error type to make the format consistent. Maybe something like:

data Error = Error
  { mainLocation :: SrcLoc,
    mainMessage :: Doc (),
    secondaryMessages :: [(SrcLoc, Doc ())],
    notes :: [Doc ()]
  }

athas commented 1 year ago

That might be a good idea. I don't have any grand vision for how the error infrastructure should look like, so anything you think is a good idea probably is. It might be a good idea to start by just adding this representation to the type checker, rather than also worrying about making parse errors and such use this.

Regarding your concrete questions:

Modify the architecture to pass the source code through all the phases and into the typechecker

There is only really one phase before type checking, which is parsing. It is all orchestrated in this file. The program source is read and parsed here and later type checked here. I think adding the original source code to LoadedFile would be the easiest way to be able to pass it on to the type checker.

Use the source in the printing of error

If the source code is stored in the type checker environment (and passed in the original checkProg call), then it should be easy enough to find the relevant lines given a line number. It's slightly more tricky to access source code for other files, so I'd recommend ignoring that initially - it's probably also much less useful.

Annotate type errors with secondary messages

There is already a limited mechanism for this as a TypeError can be associated with notes. It's much less structured than what you propose, but perhaps it can be used for inspiration.

diku-dk / futhark

Error formatting improvements #912