Alternative `codespan-reporting` style formatting?

Mesabloo / diagnose

A simple library for reporting compiler/interpreter errors

https://hackage.haskell.org/package/diagnose

BSD 3-Clause "New" or "Revised" License

258 stars 19 forks source link

Alternative `codespan-reporting` style formatting? #11

Open ruifengx opened 2 years ago

ruifengx commented 2 years ago

Thanks for your great library!

I use both Rust and Haskell, and have learned about error reporting libraries like ariadne and codespan-reporting for a while. Previously when working on a Rust project, I chose codespan-reporting because I preferred its formatting (formatting of ariadne is also beautiful, but a bit too fancy for me), given their interfaces are pretty similar. Is it possible to extend this library to have an alternative codespan-reporting-style formatting for the diagnostic messages? Or is there some design space to expose an API for custom report formatting?

FYI, here is a comparison of the said two styles (BTW it is amusing to see a more Haskell-like syntax for illustration in codespan-reporting the Rust library but a more Rust-like style here):

Name	Illustration
(current) `ariadne`-style
`codespan-reporting`-style

If this is deemed non-trivial, I am willing to work on this, in that case would you please provide some guidance?

ruifengx commented 2 years ago

Because my instance is marked as {-# OVERLAPPABLE #-}, any {-# OVERLAPPING #-} instance should be privileged (or at least that's what I think it does). However, I am not quite sure and this needs some testing.

If my understanding of the GHC manual is correct, resolving the instance also requires the priviledged instance to be strictly more specific. For this case, my understanding is that the two instances are equally specific. I believe we actually need the default instance in the library to be marked {-# INCOHERENT #-}. Anyway, I agree that testing would be the only way to confirm this.

And regarding the review for the PR, please take your time. Also, I will be happy to address any readability/efficiency issues or code duplication.

Mesabloo commented 2 years ago

It turns out that you are definitely right, and that my understanding of overlapping instances was just off. I also tested with {-# INCOHERENT #-} but GHC also reported duplicated instances (see the MWE underneath). I guess I'll just have to separate the annotation types from the instance (so that if you don't want the default instance, you don't import the module it is contained in).

EDIT: and indeed, from the GHC manual you linked, the two conditions must hold:

Eliminate any candidate IX for which there is another candidate IY such that both of the following hold:

IY is strictly more specific than IX. That is, IY is a substitution instance of IX but not vice versa.

Either IX is overlappable, or IY is overlapping. (This “either/or” design, rather than a “both/and” design, allow a client to deliberately override an instance from a library, without requiring a change to the library.)

MWE:

A.hs:

module A where

class Something a where
x :: a

instance {-# INCOHERENT #-} Something Int where
-- {-# OVERLAPPABLE #-} doesn't work either
x = 2

Main.hs:

module Main where

import A

instance {-# OVERLAPPING #-} Something Int where
x = 5

main :: IO ()
main = print (x :: Int)

And the error:

A.hs:6:29: error:
    Duplicate instance declarations:
      instance [incoherent] [safe] Something Int -- Defined at A.hs:6:29
      instance [overlapping] Something Int -- Defined at Main.hs:5:30
  |
6 | instance {-# INCOHERENT #-} Something Int where
  |                             ^^^^^^^^^^^^^

spacekitteh commented 1 year ago

So I've noticed that you're hard-coding a layout style as something which can be converted to an AnsiStyle. Why is that? Would it not make more sense to have it as a generic DiagnosticStyle which has actual diagnostic annotation information, rather than colour information? Not only would this mean that layouts could be user customisable, it also means that layouts could share more of a common vocabulary between them.

Further, prettyprinter says that annotations should be semantic in nature for as long as possible; it's still entirely plausible to separate layouts from colours, with the mapping of diagnostic components to colours delayed until the user desires it.

spacekitteh commented 1 year ago

For example, rather than something analogous to FileColor, layout algorithms would annotate it with, say, File <filename>; then renderers could choose both the colouring, AND whether to annotate it with a hyperlink.

Mesabloo commented 1 year ago

I'm not quite sure that I quite follow the suggestion here. Unfortunately, I don't think that it is possible to have a single annotation type for all renderers, given that each has their own quirks (e.g. Ariadne uses a rule color, whereas the GCC layout does not). We already discussed this with @ruifengx earlier in this issue, but new suggestions are obviously welcome! Do you mind expanding a bit on it?

spacekitteh commented 1 year ago

Sure, I'll elaborate in the morning!

spacekitteh commented 1 year ago

In the current approach (both 2.4.0 and the branch), there are two separate operations that are combined into one:

Layout. This refers to taking a Diagnostic and producing a Doc which has a two-dimensional structure; with faulty code, contextual code, hints, error codes, rules, etc. This document should contain only semantic information: annotations which say "this text corresponds to this particular filename", "this text corresponds to a line of code", etc. The user may want to use that semantic annotation before it's modified further, such as to add hyperlinks on error codes linking to a help page for that specific error, etc.
Styling. This refers to setting text properties: Bold, underline, different colours, etc. This should be applied at the latest time possible, and may not correspond to a terminal; it may be for rendering to HTML, for example.

These two things are both conceptually and semantically distinct; a given layout can have many styles, and a given style can apply to many layouts. As an example, say a user really likes the Ariadne layout, but hates the colours; then they could apply a different style to the Ariadne layout.

So, the idea is two have two passes: layout, then styling. The user should get to choose when to apply styling; and they should get to choose what style to apply to a given layout.


data DiagnosticAnnotation = File FilePath 
                          | SourceLocation Position 
                          | Severity SeverityLevel 
                          | DiagnosticCode 
                          | ...

-- | A prismatic class for diagnostic annotations, as they may be embedded
--  into a larger annotation type in a larger document.
class HasDiagnosticAnnotation diagAnn where
  injectDiagnosticAnnotation :: DiagnosticAnnotation -> diagAnn
  extractDiagnosticAnnotation :: diagAnn -> Maybe DiagnosticAnnotation 

-- | Layout the diagnostic in a given format (ariadne, gcc, codespan-reporting, etc)
layoutDiagnostic :: HasDiagnosticAnnotation diagAnn => Diagnostic -> Doc diagAnn

-- | User may wish to colourise the diagnostics in a larger document that 
-- still has other semantic annotations.
class HasAnsiStyling ann where
  injectStylingAnnotation :: AnsiStyle -> ann

-- | Apply bold/underlining/italic/colour/etc
renderAsANSI :: (HasDiagnosticAnnotation diagAnn, HasAnsiStyling stylingAnn) => 
                Doc  diagAnn -> 
                Doc stylingAnn

Mesabloo commented 1 year ago

These two things are both conceptually and semantically distinct; a given layout can have many styles, and a given style can apply to many layouts.

Reusing styles across multiple layouts is something that I would have initially liked to have. Unfortunately, your suggestion comes with a DiagnosticAnnotation which is the base for all layouts. As it is meant to semantically indicate what is which part of the layout, there are a few gotchas that I can think about:

with a single annotation type, semantics are not 100% correct for each layout. Consider the two Ariadne and GCC layouts: one uses rules, and the other does not, yet we need to handle rules in both cases, which is not correct for the GCC layout (and most probably other user-created layouts too);
separating layout and style annotations means that some parts of the library become more complicated, e.g. fetching source code (which we currently directly annotate with colors, if they are wanted for a specific layout). With this suggestion, we'd need to separate fetching the source code (and transforming TAB characters) and only later on -- when doing the rendering I guess -- try to restyle it (which by the way needs markers information);
I don't quite follow the code you wrote there, but I guess it's because I'm not familiar at all with prisms and such. Perhaps this could be simplified to a given extent?
If we want everything to be user-customisable, I think it may be wise to drop some type-classes (e.g. the current one in charge of transforming style annotations to ANSI annotations) because the user will then be unable to instantiate them. I don't really know to what extent this applies to this little piece of code though.

I'd be happy to have restyling and reusing styles for multiple different layouts, but sadly I don't know if this can be done while remaining semantically correct (and avoiding dropping cases with undefined). Let me know what you think about some points. :)