Open mxxo opened 4 years ago
I certainly wouldn't mind something like this. I don't know exactly how other languages implement this, but I think Futhark already carries around most of the necessary parts. In particular, we do maintain pretty complete source location information (filename, line, and column span). I think if we also carry around a full in-memory copy of all the input files during type checking (which isn't all that much data), we could just do a lookup in those based on the source location.
Great idea @mxxo !
What kind of error format would you like to see? clang
, rustc
and recent gcc
s come to mind for good error messages, and there's some nice error formatting documents here:
I think the things we could improve are:
SrcLoc
information (and a copy of the input file like you mentioned) to print out where the error happens (with some caret helpers or --- underlining etc). Some of the printing logic is a little hairy, I don't know how far we want to go here. help
notes if there is probable fix that might not be obvious to the programmer, like this one from the size types section in the book: For example, we could modify
[1]> zip (dup [1,2,3]) (dup [3,2,1])
Error at [1]> :1:19-31:
Cannot apply "zip" to "(dup [3, 2, 1])" (invalid type).
Expected: [ret₇]b₂
Actual: *[ret₁₂]i32
Dimensions "ret₇" and "ret₁₂" do not match.
Note: "ret₇" is unknown size returned by "dup" at 1:6-16.
Note: "ret₁₂" is unknown size returned by "dup" at 1:20-30.
with a help section like:
help: you can try to coerce each array to a given size using the :> operator.
e.g. for an output array of length 6:
zip (dup [1,2,3] :> [6]i32) (dup [3,2,1] :> [6]i32)
Similar for parse errors, etc:
let dup [n] (xs: [n]i32): [2*n]i32 =
map (\i -> xs[i/2]) (0..<n*2)
could print
Error: only variables or constants are allowed in size types.
Help: try removing the size type from the return array.
dup [n] (xs: [n]i32): []i32 =
^^
instead of today's
Error at .\tests\errors\size-type.fut:1:29-29: Parse error.
(here it's a parse error instead of type checking error, but perhaps common ideas could be used there?)
I know this is much easier said than done, especially since Futhark is a moving target and would appreciate any feedback.
Bikeshedding over how exactly the error would look like is one thing. For the implementation the difficulty is mostly about what information the compiler gives you to format into a next error message. Do you know which information you would like for the error message and which the compiler has available?
IMO just take any error message you like for the moment, format can be changed easily later anyway.
Yes, that makes sense :). From what I understand, we've got
SrcLoc
with error start and stop file, line and column info Notes
, a list of information strings about the error (where we could pass help info etc.) Doc
, the actual error message string. Sometimes it has some extra SrcLoc info. Here is the error definition: https://github.com/diku-dk/futhark/blob/master/src/Language/Futhark/TypeChecker/Monad.hs#L95
-- | Information about an error during type checking.
data TypeError = TypeError SrcLoc Notes Doc
instance Pretty TypeError where
ppr (TypeError loc notes msg) =
text (inRed "Error") <+> "at" <+> text (locStr loc) <> ":" </>
msg <> ppr notes
What's missing is the actual source lines of code (like @athas mentioned)
I think if we also carry around a full in-memory copy of all the input files during type checking (which isn't all that much data), we could just do a lookup in those based on the source location
to actually print out the erroneous lines. I am not super well versed in Haskell, maybe we could carry that info around in the TypeChecker Monad?
I like Rust's style of error messages. Generally, during the development of Futhark I have stolen from Rust whenever there was some surface language detail I wasn't sure about, since they seem to have put a lot of thought into everything (e.g. the primitive type system and notation in Futhark is taken from Rust and LLVM).
Adding "hints" like the size type stuff is fine, I guess. We already have that in a few cases. They are represented as "notes". I'm a bit biased against adding too much of that kind of stuff, as it easily becomes very verbose and very specialised, without necessarily being useful (whereas printing source context is always useful).
Printing better parse errors is difficult, because Futhark's parser is written in a strict LALR framework (it uses the Happy parser generator), and these are notoriously difficult to add sensible failure cases for. I use such a parser generator because I think it is important that Futhark remains an LR grammar, and the tool forced me to keep it that way, but I don't actually think it's necessaily good idea to use a parser generator forever. Writing an entirely new parser is too much work in the short term, though.
For printing source context, we would indeed have to carry around the original source in the TypeM monad. That's not the tricky part. The tricky part is modifying the pipeline all the way from the parser to the typechecker to carry that information along with it. However, if we assume that we only ever want to print source context for the current file, it's pretty easy. We would modify the handleFile function to also pass the file_contents
variable to the checkProg
function (and checkProg
would then be changed to put the source in a new field in the type checker environment). That's a very local change.
In terms of human friendliness, Elm is best in class for compiler errors.
This is a pretty old article (the errors have gotten even better since), but it's still good. https://elm-lang.org/news/compiler-errors-for-humans
Hi there! First-time contributor here!
I could spend some time on this issue. I have a (admittedly very unpolished) prototype of a formatting algorithm that produces output pretty similar to rustc's, and I could spend some time trying to plug it in here.
If I'm understanding correctly, the things that need to be done include:
What would be the best place to start with this?
Small update: after poking around the source for a bit, I think it might be helpful to also create a unified type Error
type to make the format consistent. Maybe something like:
data Error = Error
{ mainLocation :: SrcLoc,
mainMessage :: Doc (),
secondaryMessages :: [(SrcLoc, Doc ())],
notes :: [Doc ()]
}
That might be a good idea. I don't have any grand vision for how the error infrastructure should look like, so anything you think is a good idea probably is. It might be a good idea to start by just adding this representation to the type checker, rather than also worrying about making parse errors and such use this.
Regarding your concrete questions:
Modify the architecture to pass the source code through all the phases and into the typechecker
There is only really one phase before type checking, which is parsing. It is all orchestrated in this file. The program source is read and parsed here and later type checked here. I think adding the original source code to LoadedFile would be the easiest way to be able to pass it on to the type checker.
Use the source in the printing of error
If the source code is stored in the type checker environment (and passed in the original checkProg
call), then it should be easy enough to find the relevant lines given a line number. It's slightly more tricky to access source code for other files, so I'd recommend ignoring that initially - it's probably also much less useful.
Annotate type errors with secondary messages
There is already a limited mechanism for this as a TypeError can be associated with notes. It's much less structured than what you propose, but perhaps it can be used for inspiration.
I want to try improving Futhark's command-line errors. I've been going through the nice work done in #634 and would like to try adding colour output and caret diagnostics to typechecking errors.
For example, the duplicate parameters (https://github.com/diku-dk/futhark/blob/master/tests/errors/duplicate-params.fut) error is
which is understandable, but somewhat rudimentary.
Compared to the same error in Rust:
which is a bit of a unfair comparison (general-purpose language vs. desert hedgehog), but perhaps there's ideas we can take advantage from to make debugging easier.
Thank you for your time and all your work on Futhark.