Alternative `codespan-reporting` style formatting?

Mesabloo / diagnose

A simple library for reporting compiler/interpreter errors

https://hackage.haskell.org/package/diagnose

BSD 3-Clause "New" or "Revised" License

255 stars 17 forks source link

Alternative `codespan-reporting` style formatting? #11

Open ruifengx opened 2 years ago

ruifengx commented 2 years ago

Thanks for your great library!

I use both Rust and Haskell, and have learned about error reporting libraries like ariadne and codespan-reporting for a while. Previously when working on a Rust project, I chose codespan-reporting because I preferred its formatting (formatting of ariadne is also beautiful, but a bit too fancy for me), given their interfaces are pretty similar. Is it possible to extend this library to have an alternative codespan-reporting-style formatting for the diagnostic messages? Or is there some design space to expose an API for custom report formatting?

FYI, here is a comparison of the said two styles (BTW it is amusing to see a more Haskell-like syntax for illustration in codespan-reporting the Rust library but a more Rust-like style here):

Name	Illustration
(current) `ariadne`-style
`codespan-reporting`-style

If this is deemed non-trivial, I am willing to work on this, in that case would you please provide some guidance?

Mesabloo commented 2 years ago

One of the major setbacks would be the need to expose the internal definitions of e.g. reports, diagnostics, etc. (mainly all that is present in X.Internal modules which isn't exported by the X modules) Either that or allowing to freely format markers (which seem like most of the differences there to me).

However, if we allow retrieving files from a given Diagnostic, then it would be quite “easy” to change the overall style of reports⁽¹⁾. But only if writing a function Diagnostic msg -> Doc Annotation is not that much of a burden (which, believe me, it is).

If this is a feature that is very much wanted, then there are a few changes that will need to be done:

If we want the layout to be 100% customisable:
- First of all, the printDiagnostic and prettyDiagnostic functions will need to take a “layout” function (of type Pretty msg => Diagnostic msg -> Doc Annotation) which will be in charge of creating the to-be-rendered Document
- Once this is done, we need to refactor the current layout (under, e.g., a Error.Diagnose.Layout module). Then copy all the layouting work within the module Error.Diagnose.Report.Internal to that module (or a submodule)
- Afterwards, you should be able to freely define your own layout (we may create a simply type alias type Layout msg = Diagnostic msg -> Doc Annotation for shortness' sake) but this is the hardest part, as it means that all the internal logic must be reimplemented for your new layout (which is about 80-90% of the whole codebase, for the current layout). To facilitate creating new layouts, we may also expose helper functions which would compute extra information (for example, marker column offset in spaces or unicode codepoints)
if we only want to only minor tweaking of the overall layout (such as marker style):
- We should be able to pass very simple functions (within records, for example) which take specific information (for a marker, its line, column in unicode codepoints, computed length in number of spaces, and its style ─ error, warning, etc. ─) and output a part of a Doc Annotation

I am much less confident in the 2nd alternative, even thought the 1st one is actually way harder to implement. So I am not quite sure what's the correct way to go. What do you think about this? Are there any other alternatives that I did not mention?

⁽¹⁾: we are already able to retrieve Reports from Diagnostics and Markers, Notes, etc. from Reports. The only missing piece of data is files to be included in the report itself.

ruifengx commented 2 years ago

Thanks for your reply! I am still learning my way in the code base, so I cannot comment about the proposed approaches yet. Actually, after submitting this issue, I pulled your repo down on my local Windows machine and encountered a link error, for which I believe wcwidth is to blame. There is already an unpublished fix in their GitHub repo (solidsnack/wcwidth#3), but it involves a package named rset which is not in stack LTS and also not updated for years. I believe the fix there could be improved by replacing the linear search logic (thus also get rid of the dependency on rset) with a binary search based on arrays, so I am currently preparing a PR for that package. I will come back in a few days after I finish that PR and get familiar with the code here.

Mesabloo commented 2 years ago

I can take a look at how to structure that (and the best way to do it, although I may have an idea already) and maybe start working on it in a separate branch if you want me to. :)

As for the bug with wcwidth, I thought that the aforementioned fix was already published on Hackage, but looking back at the last update, it is not. This is definitely worthwhile to also work on this in order to make Diagnose usable on Windows.

ruifengx commented 2 years ago

I have just uploaded my changes to wcwidth here. I removed the dependency on rset and replaced it with a binary search based on array. I also added a test suite to verify my implementation is correct. It seems that the owner of wcwidth now has little interest in Haskell, and that is probably the reason why the Hackage package is still not updated.

And regarding Windows support, I have a bad news to tell: GHC's support for Unicode output through standard IO on Windows in general is still messy. I see a MR on GitLab that claims to fix it, but I cannot find the RTS switch --io-manager=native in GHC 9.0.2 (which is the latest GHC in stack LTS). A plain run of the test suite here results in a crash, and with with-utf8 the program stops crashing, but shows some question marks in the output:

Mesabloo commented 2 years ago

Alright so there's basically nothing we can do regarding the Unicode + Windows combination, other than proposing alternative ASCII layouts (which is already the case for the default layout, although it looks less fancy that way). Not sure what to do about that issue. Haskell support on Windows is still rather poor, from all those years being mainly a Unix-favored language (this should be changing as we are speaking, but turns out this isn't quite fast-paced).

ruifengx commented 2 years ago

Sorry, I should have experimented more before posting the last comment. Stack did not recognize GHCRTS="--io-manager=native", so I assumed GHC 9.0.2 did not have this. But it turned out both GHCi (with ghci +RTS --io-manager=native) and the test-suite (I found the executable manually and fed it the same RTS argument) is working correctly (at least in my settings):

So GHC's Windows support is changing, and the current progress in fact looks quite promising.

(BTW, I should be using the latest release of stack, but it reported that the cabal file in this repo was generated by a newer version of hpack. Just out of curiosity, are you using a dev-build in your workflow?)

Mesabloo commented 2 years ago

I absolutely hate the fact that everything is misaligned, and it looks like Windows is trying its best to render the unicode box characters but absolutely fails to do so with the correct width. But it's better than nothing I guess.

(BTW, I should be using the latest release of stack, but it reported that the cabal file in this repo was generated by a newer version of hpack. Just out of curiosity, are you using a dev-build in your workflow?)

I am not, so this is a bit weird.

$ stack --version 2.7.5 x86_64 hpack-0.34.7

But I don't remember which version I used to generate the .cabal file (might have been stack 2.7.3 at the time).

ruifengx commented 2 years ago

This is what I have, yet another peculiarity I guess:

$ stack --version
Version 2.7.5, Git revision ba147e6f59b2da75b1beb98b1888cce97f7032b1 x86_64 hpack-0.34.4

So stack is actually shipping officially the same version 2.7.5 with different hpack on different platforms ...

Mesabloo commented 2 years ago

This shouldn't be a problem at all if your version of hpack managed to work correctly. :)

Mesabloo commented 2 years ago

I started refactoring and moving code as per the first part of my first suggestion, in the branch custom-layouts. For now, this is a simple move of the code which originally resided in the Error.Diagnose.Report.Internal package. Note that I have isolated the ariadne-like layout inside its own package, in order to allow the user to choose which layout to use (if any at all, this is unnecessary when only the JSON backend is interesting).

Documentation has not been updated at all (there will be A LOT of places to update it), but all the rendering tests pass and give the same results as before.

As of now, we should try to see which parts of the code for the ariadne layout would be better off inside the Error.Diagnose.Layout module as helper functions (for example, the function to retrieve and colorize a line of code named getLine_). I think this should be easier if multiple layouts are defined concurrently.

~~Also note that tests have been put in the diagnose-ariadne subpackage but they don't really belong there.~~ ~~I'll move them later to somewhere else.~~ They have been moved to a global package within the repository.

ruifengx commented 2 years ago

I have just submitted solidsnack/wcwidth#6, but I am not very positive that wcwidth will be updated soon on Hackage. Given that the code there is just a bunch of Unicode data plus searching logic (which I have just reimplemented), I believe you can also copy-paste the relevant code here. (Especially so if you are okay with always using the Haskell implementation of wcwidth function.)

As of now, we should try to see which parts of the code for the ariadne layout would be better off inside the Error.Diagnose.Layout module as helper functions (for example, the function to retrieve and colorize a line of code named getLine_).

I have just started reading your code in the custom-layout branch. I guess I will try to write a prototype of codespan-reporting style layout based on the current ariadne layout, and then we will see which parts are common.

This shouldn't be a problem at all if your version of hpack managed to work correctly. :)

Actually stack refused to regenerate the cabal file if the local hpack version is older than the one that was used to generated the current copy of cabal file. But I guess that is okay for now, I can just avoid editing package.yaml.

Mesabloo commented 2 years ago

I have just submitted https://github.com/solidsnack/wcwidth/pull/6, but I am not very positive that wcwidth will be updated soon on Hackage. Given that the code there is just a bunch of Unicode data plus searching logic (which I have just reimplemented), I believe you can also copy-paste the relevant code here. (Especially so if you are okay with always using the Haskell implementation of wcwidth function.)

If the pure Haskell implementation is fully compliant with the “original” C version, I am not that bothered by including it inside this library. It'd be much better though if this could be included within a new Hackage release of the wcwidth library, for reusability's sake. Maybe we could somehow "hijack" the package by pushing updates ourselves (as long as the original author is okay with it)? Not sure how all this works on Hackage.

Actually stack refused to regenerate the cabal file if the local hpack version is older than the one that was used to generated the current copy of cabal file. But I guess that is okay for now, I can just avoid editing package.yaml.

Ah! I did not know that. You should not have to change any package.yaml in the two libraries (maybe only the tests, but you could just delete the diagnose-tests.cabal if you need to regenerate it, I don't think this is a real problem for a package not meant to be pushed anywhere).

I have just started reading your code in the custom-layout branch. I guess I will try to write a prototype of codespan-reporting style layout based on the current ariadne layout, and then we will see which parts are common.

I also had the idea to create a few more layouts (e.g. GCC-like) just for the sake of it, sort of to demonstrate the capabilities of this library, and perhaps give more choice regarding the output format. I think this will also help seeing which parts will need to be common.

For now, if you need something from my code, feel free to copy-paste it, as it will make it easier to see what's in common later. :)

Mesabloo commented 2 years ago

In the meantime, I went ahead and implemented a GCC-style layout (and style) for the library. Currently, this looks like this (this uses the same error as the one from the table of the issue):

with Unicode characters:
with plain ASCII characters:

I'll wait a bit to push it onto the branch, as I have done some changes to how the code is organised in other modules too (such as Error.Diagnose.Style). Basically, I moved all the annotation generation and handling within specific Error.Diagnose.Style.XXX (where XXX is e.g. Ariadne or GCC) to where the layout are created. This allows a better handling of annotations: for example, the GCC layout does not use the original annotation RuleColor (as there is no rule), so it uses its own annotation data type (where other minor tweaks have been performed too). Thanks to that, the library is basically plug and play (given that the layout you want is already done) and can use different styles for different layouts (given that the style is bundled with the layout, but if the annotation type is exposed, one can also create its own style).

ruifengx commented 2 years ago

for example, the GCC layout does not use the original annotation RuleColor (as there is no rule), so it uses its own annotation data type

I guess similar things happen in codespan-reporting style, too. Though of course most annotations should be reusable, so after I finish debugging my implementation I should probably come up with a way of code reuse other than copy-paste-modify.

I have been fighting with stack and GHC for debugging for a while, because the stack I use was built with GHC 8.10.4 and therefore has no support for --io-manager=native. Currently I go with producing the test suite executable and manually invoke them with appropriate RTS parameters. In the mean time I am also compiling stack with GHC 9, so either way I should hopefully be able to finish my prototype and share with you on GitHub in one or two days.

Mesabloo commented 2 years ago

I guess similar things happen in codespan-reporting style, too. Though of course most annotations should be reusable, so after I finish debugging my implementation I should probably come up with a way of code reuse other than copy-paste-modify.

Unfortunately, I don't think that reusability is doable here. Annotations are pretty much dependent on the layout used: what's the point of e.g. RuleColor for a layout without rules? or any other annotation which does not make sense for whatever reason? and how to handle additional/modifications of annotations that do not make sense in other layouts (I have examples for these, such as the Angular 9 layout which changes the background color of line numbers)? If we want to add or modify annotations on a per-layout basis, we shouldn't have to change the core of the library (basically Error.Diagnose.Style), hence why I moved them directly in each layout package. This will introduce some sort of redundancy (well, not that much because everybody names their annotations however they want), I agree with that, but I don't think we can have only a common few annotations to describe every possible rendering layout.

EDIT: here are the changes I made related to the styling:

In Error.Diagnose.Style: (replaced Annotation with this)


type Style ann = IsAnnotation ann => Doc ann -> Doc AnsiStyle

-- | The class of annotation, allowing to map each to a given color style.

-- Every annotation used in a 'Style' must implemented this typeclass. class IsAnnotation ann where -- | To be used with 'reAnnotate'.

-- Transforms the custom annotations into color annotations as described by 'AnsiStyle'. mkColor :: ann -> AnsiStyle

- In `Error.Diagnose.Diagnostic.Internal`: (changed the types to use `IsAnnotation`)
```haskell
type Layout msg ann = (IsAnnotation ann, Pretty msg) => Bool -> Int -> Diagnostic msg -> Doc ann

-- | Prints a 'Diagnostic' onto a specific 'Handle'.
printDiagnostic ::
  (MonadIO m, Pretty msg, IsAnnotation ann) =>
  -- | The handle onto which to output the diagnostic.
  Handle ->
  -- | Should we print with unicode characters?
  Bool ->
  -- | 'False' to disable colors.
  Bool ->
  -- | The number of spaces each TAB character will span.
  Int ->
  -- | The style in which to output the diagnostic.
  Style ann ->
  -- | The layout to use to render the diagnostic.
  Layout msg ann ->
  -- | The diagnostic to output.
  Diagnostic msg ->
  m ()
printDiagnostic handle withUnicode withColors tabSize style layout diag =
  liftIO $ hPutDoc handle ((if withColors then style else unAnnotate) $ layout withUnicode tabSize diag)
{-# INLINE printDiagnostic #-}

I can push them whenever you need them.

so either way I should hopefully be able to finish my prototype and share with you on GitHub in one or two days.

Don't worry at all, take all the time you need! There's no deadline for this at all. :)

I have been fighting with stack and GHC for debugging for a while, because the stack I use was built with GHC 8.10.4 and therefore has no support for --io-manager=native.

By the way, do you think this is worth mentioning in the documentation? All the Unicode stuff on Windows, like the with-utf8 package, what to do with GHC 9+ (and possibly other stuff I overlooked, like the fact that you should call setLocale LC_ALL before using wcwidth), etc.

ruifengx commented 2 years ago

By the way, do you think this is worth mentioning in the documentation?

Yes, I think that would be very helpful for users of this library, especially Haskell newcomers or people who have not explicitly dealt with Unicode issues in the past. It could be in the README, the package documentation, or even a stand-alone tutorial if you wish.

All the Unicode stuff on Windows, like the with-utf8 package, what to do with GHC 9+ (and possibly other stuff I overlooked, like the fact that you should call setLocale LC_ALL before using wcwidth), etc.

In my experience, with-utf8 is especially helpful to prevent crashing; I believe by using the Main.Utf8.withUtf8 wrapper on main, the locale of standard I/O handles (stdin, stdout, stderr) are set to lenient versions, so they will not fail for invalid characters (that is, characters which cannot be encoded in the target encoding). However, it can still be overly defensive where some Unicode characters encodable in target encoding also get replaced by ? (like I showed in previous screenshots), possibly due to the fact that GHC is still using iconv (amd it somehow failed to detect the target encoding correctly) [^1].

[^1]: This is according to a GHC issue I read a while ago, which might be outdated now.

On the other hand, at least according to the tests I have done these days, the RTS option --io-manager=native (and yes, it is only available in GHC 9+) should fix the Unicode problem on its own, both preventing the crash and getting the correct output without replacements. Yet I would still use with-utf8 because I personally prefer to enforce UTF-8 encoding, and I believe you can't be too cautious on such things.

For wcwidth, I actually did not know about the LC_ALL thing. But the Haskell implementation (both the previous one and the one in my PR) should not depend on the locale, because it unconditionally uses the embedded Unicode database.

I can elaborate on these points later if you wish to include them in the doc.

Mesabloo commented 2 years ago

For wcwidth, I actually did not know about the LC_ALL thing. But the Haskell implementation (both the previous one and the one in my PR) should not depend on the locale, because it unconditionally uses the embedded Unicode database.

This is included in the documentation, but is needed only for the FFI wcwidth (I guess it would not work with some locales like LC_C, or at least not correctly). This is indeed unneeded if it uses an embedded Unicode database (which is what you PRed there, which was already present but unpublished and worsely coded).

I can elaborate on these points later if you wish to include them in the doc.

I pretty much would like to include them in the docs, yes. This would save some time for people who never got Unicode issues and also myself as I won't have to deal with issues about that. A tutorial is already present in the Error.Diagnose module, and there's also a FAQ section in there. I guess we could insert new questions (with the Haskell errors when trying to print unicode characters) and answer them there!

I'm curious because I didn't know one could do that, but how did you make the little linkable footer at the end of your comment?

ruifengx commented 2 years ago

I'm curious because I didn't know one could do that, but how did you make the little linkable footer at the end of your comment?

I believe it is part of the extended Markdown syntax: use [^1] or [^longlabel] for where the marker should be placed, and write later (in a separate paragraph) [^1]: the actual footnote text. It doesn't matter what label you use (as long as they don't clash), nor where you place the footnote text: the marker will always be numbers starting from 1, and the footnotes will always go to the end of the comment.

Here is a GitHub blog that announced this syntax is supported on GitHub.

ruifengx commented 2 years ago

A quick update on wcwidth. It turns out the Haskell implementation is incorrect from the very beginning: all control characters (and according to the implementation, "control characters" also include the space character) are reported to have width -1 (perhaps to indicate an error). So this was the source of the "everything is misaligned" screenshot, instead of Windows Terminal. Now I believe I will have to validate the logic of that implementation myself and give it a patch or rewrite.

Mesabloo commented 2 years ago

I see. That's a bit weird but I never really took a look at the Haskell implementation of wcwidth so I guess it makes sense. The negative return codes should probably be handled in the getLine_ function (as of now, we don't, which causes some trouble when trying to print control characters because they also have a width of -1 according to the C implementation of wcwidth, if I understand the specification correctly).

RETURN VALUE The wcwidth() function shall either return 0 (if wc is a null wide-character code), or return the number of column positions to be occupied by the wide-character code wc, or return -1 (if wc does not correspond to a printable wide-character code).

I hope you'll be able to figure all this out. Unfortunately, I don't see the original maintainer giving help on this as they don't seem to use Haskell anymore, but I hope you could take ownership of it? Nobody else seems to be quite interested in it though...

I have just discovered that the original pure Haskell implementation was based on this pure Python implementation. Maybe there's some stuff you could gather from it? (checking from the issue tab, there's only a few minor bugs, but it should work for most Unicode standard versions)

ruifengx commented 2 years ago

I was also searching on this topic, and found out there once was a char::width implementation in Rust's libstd, but it did not make it into stable Rust. This char::width and the Python implementation you mentioned both are based on Markus Kuhn's implementation (which dates back to 2007, Unicode 5.0). I actually quite like the Rust implementation, especially its signature: it returns an Option, which forces the user to explicitly handle the control characters.

Unfortunately, I don't see the original maintainer giving help on this as they don't seem to use Haskell anymore, but I hope you could take ownership of it? Nobody else seems to be quite interested in it though...

Yes, I also don't expect to receive help from the original maintainer. To be fair, the original package was essentially a thin wrapper around the C function wcwidth, so they probably did not expect the mess at all. Although I am not sure about taking ownership, partly because there is only one dependency on wcwidth on Stackage, and also because I am not familiar to maintaining packages on Hackage at all. Anyway, I will first prepare a patch. After that we can decide whether to include the implementation here or try to update the wcwidth package on Hackage.

And BTW, the bug for space character is due to these line: https://github.com/solidsnack/wcwidth/blob/900e463d9472aba3620198d2f32987b7402be5b9/Data/Char/WCWidthHaskell.hs#L413-L417 The ranges is inclusive on both ends, because that is how the data is stored in UCD. But here apparently the 32 was intended to be exclusive (due to the use of < in the C implementation).

I will take some time to also check for other bugs before resubmitting the PR.

ruifengx commented 2 years ago

And regarding the width of control characters, I believe it is perfectly okay to just special case for tab, backspace, etc. and assume the width is 1 for the rest. Most terminal emulators also don't handle the control characters, and display them as solid rectangles (or whatever the "missing" glyph is in the font), which usually have width 1.

Mesabloo commented 2 years ago

I actually quite like the Rust implementation, especially its signature: it returns an Option, which forces the user to explicitly handle the control characters.

I have to admit that I also quite like that! Definitely better.

Although I am not sure about taking ownership, partly because there is only one dependency on wcwidth on Stackage, and also because I am not familiar to maintaining packages on Hackage at all.

I think that the dependency count accounts only for package within stackage snapshots. For example, diagnose isn't, yet it depends on wcwidth (I think that chapelure also depends on it? Or they only planned on depending on it as mentioned by one of their maintainers in https://github.com/Mesabloo/diagnose/issues/1#issuecomment-1108754786). So I'm not quite sure how accurate this is.

And BTW, the bug for space character is due to these line: https://github.com/solidsnack/wcwidth/blob/900e463d9472aba3620198d2f32987b7402be5b9/Data/Char/WCWidthHaskell.hs#L413-L417 The ranges is inclusive on both ends, because that is how the data is stored in UCD. But here apparently the 32 was intended to be exclusive (due to the use of < in the C implementation).

I quickly took a look at your new implementation, and I was ready to leave a comment exactly at this exact line in your code, but I simply could not understand whether your binary search was inclusive or exclusive on the last number of each range. But I'm happy that you found the issue!

And regarding the width of control characters, I believe it is perfectly okay to just special case for tab, backspace, etc. and assume the width is 1 for the rest. Most terminal emulators also don't handle the control characters, and display them as solid rectangles (or whatever the "missing" glyph is in the font), which usually have width 1.

We already had a special case for the tab characters (so that you as the user choose how many spaces are used to render a tab character). I'm worried though that there will be many special cases for most of control characters. But if they are rendered anyway as single cell characters, then I guess it's fine to consider most of them as having a width of 1.

solidsnack commented 2 years ago

For the most part, WCWidth relies on the native wcwidth implementation, from wchar.h. On Linux, Mac OS, FreeBSD, &c, the WCWidthHaskell implementation is simply not used. Reading over @ruifengx's post, I see that he uses Windows; and this is the one platform where the Haskell implementation is used.

There is a mistake, on WCWidthHaskell.hs:415 -- 32 should be 31. This leads to ASCII whitespace being treated like a control character. This can be fixed easily.

@Mesabloo is correct that, per POSIX, wcwidth is supposed to return -1 for a wide variety of characters.

Please find attached, the ranges pulled from the native implementation on a recent Macintosh.

ranges.txt

Mesabloo commented 2 years ago

Oh hi there!

There is a mistake, on WCWidthHaskell.hs:415 -- 32 should be 31. This leads to ASCII whitespace being treated like a control character. This can be fixed easily.

This is pretty much what @ruifengx figured in https://github.com/Mesabloo/diagnose/issues/11#issuecomment-1236337638. I believe that a fix will be coming your way in the next few days (along with a rewrite of the pure Haskell implementation for a better/more efficient search strategy).

@Mesabloo is correct that, per POSIX, wcwidth is supposed to return -1 for a wide variety of characters.

However, would it be ok to provide a more “Haskell-ish” way of handling -1 (which in the C world mostly mean “errors”) via a Maybe Int? Or we could add another function (like wcwidthMay :: Char -> Maybe Int which would act like wcwidth but return Nothing instead of -1) for this purpose, while still exposing the POSIX wcwidth? What do you think about it @solidsnack?

ruifengx commented 1 year ago

It took me much longer than I expected, but anyway I have just finished a prototype of the codespan-reporting style. It definitely needs more fine-tuning, and I will need some time to merge your changes here. But the good news is it is working on all the test cases. Here are some screenshots:

Test Case	Screenshot
Single-line markers
Multi-line markers
Unicode & Tab
Source line omitting

Currently the placement of multi-line markers is not ideal:

Current	Improvement (not yet)

I have a plan of improvement and will rewrite this part soon.

Below are some thoughts during the coding these days:

In codespan-reporting the crate, there are five different levels of severity. I coded the Layout in such a way that all the five are allowed, but only Error and Warning are actually used:
```
data Severity
  = Bug
  | Error   -- from `Err`
  | Warning -- from `Warn`
  | Help
  | Note
  deriving (Show, Eq, Ord)
```
I don't know if it makes sense to add all these severities here, or to further parametrise the Report (and therefore Diagnostic) type by an additional sev variable.
Similar things (but in the opposite way) for marker types. codespan-reporting only has primary and secondary labels, and we have four (This, Where, Maybe, Blank). Blank is very convenient for including source lines in the report without making other visual differences, which I hope codespan-reporting had in the beginning. We will have to decide what to do with the styling for Where and Maybe. Currently I just assigned the colour for secondary labels to both of them, which is not ideal. The solution would be to pick one more colour to distinguish between them, or (again) to parametrise the Report (and therefore Diagnostic) type by an additional lbl variable.
The code it is around ~370 lines, and unfortunately I have not (yet) add enough comments to make the logic clear (the naming there could probably be improved, too). The logic follows codespan_reporting::term::renderer (the comments there are really helpful). You can find the current version of my implementation in my fork. I can submit a draft PR if you prefer.

I ended up rewriting most of the rendering logic because it was easier to follow the original logic. However, there is one function worth mentioning that I reused (and changed a bit):

replaceLinesWith :: Doc ann -> (Doc ann -> Doc ann) -> Doc ann -> Doc ann
replaceLinesWith repl t = go
  where
    go Line = repl
    go Fail = Fail
    go Empty = Empty
    go (Char '\n') = repl
    go (Char c) = Char c
    go (Text _ s)
      = mconcat
      $ intersperse repl
      $ map t
      $ uncurry Text . (T.length &&& id)
      <$> T.split (== '\n') s
    go (FlatAlt f d) = FlatAlt (go f) (go d)
    go (Cat c d) = Cat (go c) (go d)
    go (Nest n d) = Nest n (go d)
    go (Union c d) = Union (go c) (go d)
    go (Column f) = Column (go . f)
    go (Nesting f) = Nesting (go . f)
    go (Annotated ann doc) = Annotated ann (go doc)
    go (WithPageWidth f) = WithPageWidth (go . f)

This function is critical for supporting multi-line messages. I added an additional parameter t :: Doc ann -> Doc ann, this transformation is applied to every line of the input Doc separately, so that repl does not get affected by the annotation on the input Doc.

I am not 100% sure whether the source range in the report should be a closed range or right half-open. I assumed it to be a closed range, but the rendering looks weird for some test cases.

Mesabloo commented 1 year ago

This already looks nice! Although is it me or do the single-line This markers have an extra caret when going until EOL?

In codespan-reporting the crate, there are five different levels of severity. [...] I don't know if it makes sense to add all these severities here, or to further parametrise the Report (and therefore Diagnostic) type by an additional sev variable.

Similar things (but in the opposite way) for marker types. [...]

I don't want to give a definitive answer yet, but:

The Help and Note severities are already handled by the Note data type, aren't they? Unless they mean something else
Bug is just a more critical version or Error. Is it really that useful to have it in practice, or can one only rely on Error?
If we were to include all those new severities as parts of Report, we'd need colors etc. for every other layout (currently: the ariadne-like, a GCC-like and a typescript-like) + additional ways to format them (perhaps differently?) I have not quite found a usecase for both Help and Note though: I usually am satisfied by Where

The code it is around ~370 lines, and unfortunately I have not (yet) add enough comments to make the logic clear (the naming there could probably be improved, too). [...]

I don't think code length is really an issue, unless it leads to very poor performances (which is not that huge of an issue because rendering is usually done only once in the pipeline). My code for the ariadne-layout is also quite big. Don't forget that we'll refactor the common functions later on!

I ended up rewriting most of the rendering logic because it was easier to follow the original logic. [...]

No problem here. It's a shame that we need to have this replaceLinesWith function. Ideally, prettyprinter would allow us to specify “prefixes” to put when breaking lines (instead of just whitespaces). In the mean time, I quite like how you rewrote it. Definitely better than my original one!

I am not 100% sure whether the source range in the report should be a closed range or right half-open. I assumed it to be a closed range, but the rendering looks weird for some test cases.

I don't think I understood. In some of the tests cases (which were borrowed from a few of my projects, namely nstar and zilch), the positions are put in weird positions (my original lexer/parser combination used to give positions which go “too far” into the stream). However, those tests cases opened a few bugs in the implementation of the ariadne-layout. This is why I kept them in the test suite. If this not what you meant, then I definitely did not understand, please pardon me.

Regarding the current output: it looks like it uses half unicode half ASCII. Do you intend it to be that way? Also, will you provide unicode-free alternatives? If not, then I think that deciding whether to provide Unicode+ASCII or just Unicode or just ASCII should be left to the implementor of the layout, hence moved outside of the core library. What do you think about it?

ruifengx commented 1 year ago

Although is it me or do the single-line This markers have an extra caret when going until EOL?

This is why I asked whether the source ranges should be closed or half-open (that is, should the larger end-point be inclusive or exclusive). I am currently assuming them to be inclusive on both ends, and I coded the rendering in such a way to allow "off-by-one" markers (pointing at technically where '\n' would be).

The Help and Note severities are already handled by the Note data type, aren't they? Unless they mean something else.

AFAIK, in codespan-reporting the crate, it is possible to produce a diagnostic whose severity is "note" or "help". Such diagnostics are not attached to another "error" or "warning". I don't actually see a concrete use case for such stand-alone "note" or "help" diagnostics, but this is the status quo there, so I thought perhaps they have their own reasoning.

Regarding the current output: it looks like it uses half unicode half ASCII. Do you intend it to be that way?

I should have clarified this in the previous comment. All the screenshots are for the Unicode output (rendering style of codespan-reporting is somewhat simplistic). I picked the Unicode version because it looks prettier than the ASCII version. Anyway, here is a screenshot for the ASCII rendering for your information: ASCII

For switching between Unicode and ASCII, I followed the approach in the original crate. Instead of passing around a useUnicode boolean flag, I packed all marker characters/strings into a Chars record. In the Layout, I then switch between unicodeChars and asciiChars according to the useUnicode flag.

Mesabloo commented 1 year ago

This is why I asked whether the source ranges should be closed or half-open (that is, should the larger end-point be inclusive or exclusive). I am currently assuming them to be inclusive on both ends, and I coded the rendering in such a way to allow "off-by-one" markers (pointing at technically where '\n' would be).

Initially, the ranges were meant to be inclusive on both ends. However, every layout is free to handle ranges howeevr they want to, as long as this behavior is coherent within the layout, I guess.

AFAIK, in codespan-reporting the crate, it is possible to produce a diagnostic whose severity is "note" or "help". Such diagnostics are not attached to another "error" or "warning". I don't actually see a concrete use case for such stand-alone "note" or "help" diagnostics, but this is the status quo there, so I thought perhaps they have their own reasoning.

I see. I'm quite unsure how useful that is (usually, "notes"/"helps" are used to provide context/potential fixes for the current error/warning. Having standalone "notes"/"helps" seems weird in that setting: what are they adding context to then? Not sure if that's really useful...

I should have clarified this in the previous comment. All the screenshots are for the Unicode output (rendering style of codespan-reporting is somewhat simplistic). I picked the Unicode version because it looks prettier than the ASCII version. Anyway, here is a screenshot for the ASCII rendering for your information:

Looks great!

For switching between Unicode and ASCII, I followed the approach in the original crate. Instead of passing around a useUnicode boolean flag, I packed all marker characters/strings into a Chars record. In the Layout, I then switch between unicodeChars and asciiChars according to the useUnicode flag.

That's actually a very good idea. This makes the code more readable (instead of my usual long ugly ifs). That might be something worth pushing in the core of the library.

ruifengx commented 1 year ago

I think now I kind of know why Help etc. is made to be a stand-alone severity. Consider the following message produced by rustc:

error[E0106]: missing lifetime specifier
 --> <source>:1:18
  |
1 | pub fn test() -> &str {
  |                  ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime
  |
1 | pub fn test() -> &'static str {
  |                   +++++++

Note how the "help" is allowed to have its own markers. I agree that helps and notes should always be attached to some major error/warning, but my current implementation cannot produce the same output.

I want the new codespan-reporting style to be able to render the above diagnostic. One easy way out is to extend the severities. Or we could allow nested Report in note/help section. However, given the current API here, I can think of another possible way: to repurpose the Where/Maybe marker, so that e.g. This/Where markers correspond to primary/secondary markers, and Maybe markers are gathered separately for use in the help section (it would still be unclear which help they should attach to, if multiple helps are provided; I think a good-enough heuristic is to attach all those markers to the last help). The naming of these markers would become a bit unfortunate, but I guess it is not really a big deal.

After implementing this feature, the overall visual effects should look satisfactory to me. I will comment here again when I finish the implementation, and perhaps we can start working on eliminating the code duplication and aim at a merge.

Mesabloo commented 1 year ago

There are a few problems here.

A Diagnostic may not contain reports which are meaningful to each other. For example, if you have 3 warnings, they will be in the same Diagnostic put will not necessarily be related to each other. If we were to have standalone Help/Note reports, this would need to be done differently. From the error above, I can see that the help: report is meant to be with the error[E0106] report, but is not necessarily related to the help: note inside the first report (in fact, it doesn't quite seem like it, hence the standalone part).

The Maybe marker is kind of a mistake. It turns out it is quite redundant with Hints, but with the added bonus that the Maybe marker is actually a marker, not a note. We can definitely merge both of these (same for the Where and Note). So let's have just Primary and Secondary markers too (in place of This and Where) ─ although are secondary markers really needed with the changes underneath? I believe so but I'd like your opinion ─.

The Report datatype does not need to change. Only the Note one, which we could repurpose to also include markers if needed (EDIT: this is what you suggested, basically, but I don't think we should allow full reports, only markers). For example,

data Note msg 
  = Note msg [Marker msg]
  | Hint msg [Marker msg]

That way, your layout can be done using a first Hint with no markers, and a second Hint with a single text-less marker (and it isn't lacking sense to do it like this).

Because all the work we are doing is a complete breaking change, I don't really mind if there are more than just a few, as long the library is as generic as needed (which should be about 98%/99% when this feature will be fully implemented).

Mesabloo commented 1 year ago

I'll have to rework some of the layouts I've done up until now if we are doing this, but this isn't such an issue. Maybe it'll be a bit harder for the ariadne-like layout as I'll have to check how to render the new "note markers" beautifully (not sure if they do it in ariadne?). As for the other layouts, it shouldn't be hard at all.

ruifengx commented 1 year ago

The Report datatype does not need to change. Only the Note one, which we could repurpose to also include markers if needed (EDIT: this is what you suggested, basically, but I don't think we should allow full reports, only markers). For example,
data Note msg 
  = Note msg [Marker msg]
  | Hint msg [Marker msg]
That way, your layout can be done using a first Hint with no markers, and a second Hint with a single text-less marker (and it isn't lacking sense to do it like this).

Yeah, it looks to me the proper solution. Although there was an oversight in my last comment: note how the diagnostic produced by rustc suggested adding a 'static in the file, and printed the modified line in the help section. To add to the complexity, the added text is 'static␣ (note the space), but the marker only covers 'static (i.e., without the space). An easy way to support this[^4] is to use different variants of markers in the help message (the exact naming to be determined)[^1][^2][^3]:

data HelpMarker msg
  = Add { _insertionPoint :: SrcPos, _text :: String, message :: msg }
  | Remove { _removalRange :: SrcRange, message :: msg }
  | Annotate { _annotationRange :: SrcRange, message :: msg }

(field names given only for clarity)

Or rather (factor out the common message field):

data HelpMarkerType
  = Add SrcPos String
  | Remove SrcRange
  | Annotate SrcRange
type HelpMarker msg = (HelpMarkerType, msg)

[^1]: Previously there is only Position which contained a range. Now we might need both SrcPos and SrcRange. [^2]: Not sure whether the Remove variant would be useful or not. Add will serve for the purpose illustrated above, and Annotate is just the normal marker. Technically Remove is similar to the regular Annotate in that they both render a marker on existing source. It is only useful to allow different rendering (e.g., green for insertion, red for removal, blue/yellow for annotation). [^3]: Also not sure whether we should allow Add/Remove outside the help section. I can think of one use case for it: for layout-sensitive languages like Haskell, when reporting layout-related errors, the diagnostic could include the inserted {, ;, and } to better illustrate the cause of the error. Though I guess such things is relatively rare in practice. [^4]: For the "space problem" (the insertion of text should include the space but the marker should not cover it), I can think of a hack: we take special care when rendering the Add marker to always avoid covering spaces on the boundary (and perhaps avoid doing so if the added text contained only spaces). Anyway, this could be left for the layout style to decide.

Anyway, I will first try to craft a working implementation, and the design will probably require some more tweaks to achieve the desired flexibility.

Mesabloo commented 1 year ago

This proposal will be very hard to handle, in that we have to modify the source code shown but keep the correct positions for markers. This will most probably induce redundancy, as we should need two different ways (or even 3) of fetching source code, when we want to simply annotate, to add code or to remove code.

EDIT: :warning: we may also want to add full lines instead of just chunks of code.

[...]

I can see how a Remove variant can be useful. Consider this example from my programming language:

public assume {@0 A : type}

In the current state, the error given for this line is:

[error]: Public assumptions in module.
     ╭──▶ /home/mesabloo/projects/zilch/gzc/basics-dev/test/a.zc@1:1-1:27
     │
   1 │ public assume {0 A : type}
     • ┬─────────────────────────
     • ╰╸ Cannot bind assumptions publicly
     •
     │ Hint: Remove the 'public' modifier from this declaration.
─────╯

Perhaps the error message could also include a Remove marker? I don't quite see how to render it though (it's easier to picture adding than removing).

[...]

This is not such a problem is it? We could require a SrcRange for Add (instead of a SrcPos) and insert the text at the beginning of the range, while the marker will cover the whole range. Computations will have to rely on the max between the length of the marker, and the length of the added text (which may rather be a Text than a String).

[...]

I don't think they really have a use outside of help/note sections. They could even mess up error reporting if used outside those sections, as the source code for the main markers will not be reliable (the purpose of putting the source code at the top is for it to be reliable, otherwise it is useless/confusing[^1]). This unfortunately duplicates some logic (see the beginning of my comment) but it should be okay for now. We'll eliminate code duplication later anyways.

[^1]: If the source code in the main error message is not the one I wrote, I'd question whether I correctly saved the file or not, or if the tool is just bugged. This is time wasted on a confusing behavior, which we shouldn't allow.

ruifengx commented 1 year ago

This proposal will be very hard to handle, in that we have to modify the source code shown but keep the correct positions for markers. This will most probably induce redundancy, as we should need two different ways (or even 3) of fetching source code, when we want to simply annotate, to add code or to remove code.

Agreed. I will spend several days to experiment on this and see if it is viable at all.

Perhaps the error message could also include a Remove marker? I don't quite see how to render it though (it's easier to picture adding than removing).

I'm not entirely sure about the ariadne style, but for rustc-like style, I imagine it should be rendered like this (I cannot include colours here, but the line under "public" should ideally use a "removal marker style"):

error: Public assumptions in module.
 --> /home/mesabloo/projects/zilch/gzc/basics-dev/test/a.zc:1:1
  |
1 | public assume {0 A : type}
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^ Cannot bind assumptions publicly
  |
help: Remove this 'public' modifier from the declaration.
  |
1 | public assume {0 A : type}
  | ------
  |

Since + is used by rustc to indicate insertion, perhaps it is also reasonable to use - to mean removal, but visually it looks more like dashed lines instead of line of minus signs. Anyway, the idea is to use different styles for "removal" and "annotation", so it becomes easily distinguishable.

If the source code in the main error message is not the one I wrote, I'd question whether I correctly saved the file or not, or if the tool is just bugged. This is time wasted on a confusing behavior, which we shouldn't allow.

Now that you mention it, yes it will be confusing. I was originally visualizing the rendering in such a way that e.g. the inserted text use fainted colours, but of course we do want the diagnostic not to cause any misunderstanding even if colours are turned off.

We could require a SrcRange for Add (instead of a SrcPos) and insert the text at the beginning of the range, while the marker will cover the whole range.

Yes, in this way the user gets full control over how many characters the marker should span. Though I don't expect "insertion markers" to affect the positions of other following markers, that is to say, I would naturally expect all positions used in the markers should correspond to the positions in the original (unmodified) text.

I know the idea definitely needs polishing and it has to be verified in the implementation. I will come back with the result several days later, and we may have a deeper understanding then to decide on the details.

Mesabloo commented 1 year ago

Hello! Do you have any update? It's been quite a while since last time, so I am just curious. :)

Mesabloo commented 1 year ago

From https://github.com/Mesabloo/diagnose/issues/11#issuecomment-1245481939

I don't want to give a definitive answer yet, but:

The Help and Note severities are already handled by the Note data type, aren't they? Unless they mean something else

Bug is just a more critical version or Error. Is it really that useful to have it in practice, or can one only rely on Error?

If we were to include all those new severities as parts of Report, we'd need colors etc. for every other layout (currently: the ariadne-like, a GCC-like and a typescript-like) + additional ways to format them (perhaps differently?) I have not quite found a usecase for both Help and Note though: I usually am satisfied by Where

I have thought about it a bit and I think that a Critical version may also be of use. Critical errors must be emphasized compared to simple user errors, because the user as no control over these and must report them upstream (examples of these include the IMPOSSIBLE errors from GHC). As such, maybe we can in fact also consider supporting a Critical report kind? I have a few ideas how to style them (for example a brighter, bold red) in the current layout.

For the other points, we already discussed them before so there's nothing more to be added here.

From https://github.com/Mesabloo/diagnose/issues/11#issuecomment-1256956447

Agreed. I will spend several days to experiment on this and see if it is viable at all.

I can also try on my end, see what I can come up with (hopefully something that can be reused/that is generic). I have taken a look at what you have already done for the codespan-reporting layout and I think your code looks better than mine, so I'll take inspiration from yours. I may also start working on the documentation if I come up with something good, and also merge a few things in the core library if I really feel the need to. We'll see at the end what needs to be kept/removed from the core library anyway.

Mesabloo commented 1 year ago

From https://github.com/Mesabloo/diagnose/issues/11#issuecomment-1256930847

This proposal will be very hard to handle, in that we have to modify the source code shown but keep the correct positions for markers. This will most probably induce redundancy, as we should need two different ways (or even 3) of fetching source code, when we want to simply annotate, to add code or to remove code.

EDIT: :warning: we may also want to add full lines instead of just chunks of code.

What I thought would be pretty hard to handle turned out to be way simpler! Using a very simple GHCi script, I managed to print some results:

A green italicized piece of code is added using the special AddCode marker
A red bold piece of code is just either a primary, secondary or Annotate marker
A dull black piece of code is for a RemoveCode marker

This was generated using the following markers:

markers2 =
  [ (Position (2, 38) (2, 47) "test1", Left $ Primary ""),
    (Position (2, 5) (2, 11) "test1", Right $ AddCode "unsafe " "")
  ]

The markers' which are returned are the original markers, where some positions may have been shifted because of AddCode markers. However, note that the original order is preserved. Positions are shifted so that the logic of figuring out where each marker goes is not duplicated (and is kept within the fetchLine function). As an example, see the new markers which are generated by the example above (pretty-printed for convenience):

λ> print (fst <$> markers')
[  Position {begin = (2,45), end = (2,54), file = "test1"},
   Position {begin = (2,5), end = (2,11), file = "test1"}
   ]

Only the position of the first marker is affected, because it appears after a AddCode marker.

It also works when multiple AddCode markers are on the same line:

This also works when the added piece of text contains newline, though I think this will be hard to handle when showing markers. We may want to forbid it, but I guess we'll see about that.

This code is more generic than it is required to be. Instead of either a list of SimpleMarker msg (markers for the main part of the report) or a list of NoteMarker msg (markers which may appear in notes), a list of Either (SimpleMarker msg) (NoteMarker msg) is taken. This leads to being able to mix both types of markers, but it should never happen if the function is called correctly (we may also not expose it directly, and only expose variants which take one of the two lists).

Here is the source code (which is rather long, so put into a dropdown section)

Code for `fetchLine`

```hs fetchLine :: forall msg ann. Pretty msg => -- | The mapping between file names and file contents. FileMap -> -- | The path to the file to get the line from. FilePath -> -- | The line number corresponding to the line to get. Int -> -- | The number of spaces used to show a TAB character. Int -> -- | The annotation to add to the line when no line is found, in order to colorize the placeholder text (such as @""@). (ann, msg) -> -- | The annotation to add to the line when there are no markers underneath. ann -> -- | The annotation to add to the line when a marker is found underneath. (Either (SimpleMarker msg) (NoteMarker msg) -> ann) -> -- | All the markers present on that line. -- -- We want to apply different treatments to simple markers and marker which can appear in notes. [(Position, Either (SimpleMarker msg) (NoteMarker msg))] -> -- | Returns the width table and the generated annotated 'Doc'ument. -- -- Markers are also returned with the position adjusted to fit the code perfectly (in case -- of code modification with 'AddCode' markers). (WidthTable, [(Position, Either (SimpleMarker msg) (NoteMarker msg))], Doc ann) fetchLine files path line tabSize (noLineAnn, noLineText) codeAnn markerAnn markers = case safeArrayIndex (line - 1) =<< files HashMap.!? path of Nothing -> (mkWidthTable "", [], annotate noLineAnn (pretty noLineText)) Just code -> mkTuple3 (mkWidthTable code, colorizeCode 1 markers True code) where colorizeCode :: Integer -> [(Position, Either (SimpleMarker msg) (NoteMarker msg))] -> Bool -> String -> ([(Position, Either (SimpleMarker msg) (NoteMarker msg))], Doc ann) colorizeCode _ markers _ "" = (markers, mempty) colorizeCode n markers handleAdd (c : code) -- if we found that a note marker starts at this position, -- then we have to start potentially overriding the code (by adding chunks, for example). -- -- in such case, we stop processing the code at this point, add the chunk of code -- and then continue. -- we also want to adjust the position | (True, Just (AddCode code' _)) <- (handleAdd, specialNoteMarkerAt n markers) = let addLength = length code' -- register the length of the text, to add to all marker positions -- starting after @n@. -- WARN: don't partition here! we want to keep the original order of the markers. markers' = markers <&> \tup@(Position (bl, bc) (el, ec) f, m) -> if bc < fromIntegral n then tup else case m of -- if this is the marker we are currently handling, do NOT shift it! Right (AddCode _ _) | bl == line && bc == fromIntegral n -> tup _ -> let newStart = if bl == line then bc + addLength else bc newEnd = if el == line then ec + addLength else ec in (Position (bl, newStart) (el, newEnd) f, m) in colorizeCode n markers' False (code' <> (c : code)) | otherwise = let doc = ifTab (fold $ replicate tabSize space) pretty c allMarkersAtPos = snd <$> flip filter markers \case (Position (bl, bc) (el, ec) _, _) | bl == el -> -- for inline markers (those which the ending line is the same as the starting line), -- we can just check if n is included insie @[start, end[@. fromIntegral n >= bc && fromIntegral n < ec | otherwise -> -- for multiline markers, it is a little bit more complicated. (bl == line && fromIntegral n >= bc) || (el == line && fromIntegral n < ec) || (bl < line && el > line) colorizedDoc = maybe (annotate codeAnn) (annotate . markerAnn) (head' allMarkersAtPos) doc in second (colorizedDoc <>) $ colorizeCode (n + 1) markers True code specialNoteMarkerAt :: Integer -> [(Position, Either (SimpleMarker msg) (NoteMarker msg))] -> Maybe (NoteMarker msg) specialNoteMarkerAt n = fmap (fromRight undefined . snd) . find \(pos, mark) -> isRight mark && snd (begin pos) == fromIntegral n ifTab :: a -> (Char -> a) -> Char -> a ifTab a _ '\t' = a ifTab _ f c = f c mkWidthTable :: String -> WidthTable mkWidthTable s = listArray (1, length s) (ifTab tabSize wcwidth <$> s) head' :: [a] -> Maybe a head' [] = Nothing head' (x : _) = Just x {-# INLINE head' #-} mkTuple3 :: (a, (b, c)) -> (a, b, c) mkTuple3 (x, (y, z)) = (x, y, z) {-# INLINE mkTuple3 #-} ``` If there's something which isn't quite clear, feel free to comment on it!

ruifengx commented 1 year ago

Thanks for the update!

I was mainly working on code clean up, because the rendering logic was previously written as two long functions, and I now managed to break one of them (the entry, for rendering Report) into shorten functions. I also added some documentations to provide an outline for the rendering process. Hopefully this later makes the merge easier, and also makes maintenance less a pain.

Alongside code restructuring and documentations, I also implemented a basic "grouping" logic, so the following problem is now solved:

Currently the placement of multi-line markers is not ideal: Current Improvement (not yet)

I have a plan of improvement and will rewrite this part soon.

I am also currently investigating on the new marker types for attached notes and helps. This is another reason why I was refactoring the code. I believe the rendering could reuse a substantial amount of code (I mean the part currently responsible for rendering the sub-reports for each file, namely sourceLine in my fork). (I kinda think that your fetchLine serves basically the same purpose as my sourceLine, so I will very likely benefit from the code logic you posted here.)

Mesabloo commented 1 year ago

About attached notes and helps, I have changed the Report and Note types so that we can have line markers in notes. I have not added the cases for standalone notes and helps (yet?), although if it is clearly needed I'll add them.

For now, the code looks like this (in Error.Diagnose.Report.Internal):

```hs -- | A 'HashMap' associating a 'FilePath' to an array of lines indexed by line numbers. type FileMap = HashMap FilePath (Array Int String) -- | The type of diagnostic reports with abstract message type. data Report msg = Report Severity -- ^ The severity of the report. (Maybe msg) -- ^ An optional error code to print with the report. msg -- ^ The message associated with the report. [(Position, SimpleMarker msg)] -- ^ A map associating positions to markers to show under the source code. [Note msg] -- ^ A list of notes and hints to show various help information after the error. -- | The severity of a report describes how much it is important to take into account. data Severity = -- | A warning report, which is important but not necessary to fix. Warning | -- | An error report, which stops the whole pipeline. Error | -- | A critical report, indicating that something internally failed. Critical -- | The type of markers which are meant to be shown under a line of code. data SimpleMarker msg = -- | A primary marker, which conveys the most important information (such as the location of an error). Primary msg | -- | A secondary marker, which is meant to add extra information to the report. Secondary msg | -- | A blank marker which is invisible but allows to include specific lines in the final report. Blank #ifdef USE_AESON instance ToJSON Severity where toJSON Warning = toJSON ("warning" :: String) toJSON Error = toJSON ("error" :: String) toJSON Critical = toJSON ("critical" :: String) instance ToJSON msg => ToJSON (Report msg) where toJSON (Report sev code msg markers hints) = object [ "kind" .= sev , "code" .= code , "message" .= msg , "markers" .= fmap showMarker markers , "notes" .= hints ] where showMarker (pos, marker) = object $ [ "position" .= pos ] <> case marker of Primary m -> [ "message" .= m, "kind" .= ("primary" :: String) ] Secondary m -> [ "message" .= m, "kind" .= ("secondary" :: String) ] Blank -> [ "kind" .= ("blank" :: String) ] #endif -- | A note is a piece of information that is found at the end of a report. data Note msg = -- | A note, which is meant to give valuable information related to the encountered error. Note msg [(Position, NoteMarker msg)] | -- | A hint, to propose potential fixes or help towards fixing the issue. Hint msg [(Position, NoteMarker msg)] data NoteMarker msg = -- | We need a piece of code to be added inside the source code. AddCode String -- ^ The piece of code to insert. This must NOT contain newlines. -- This may also be longer or shorter than the span of the marker. msg -- ^ The possibly empty message attached to the marker. | -- | Code is requested to be deleted because it causes an error. RemoveCode msg -- ^ The message attached to the marker. | -- | A simple annotation under source code to add extra information (e.g. in notes) Annotate msg -- ^ The message attached to this marker. #ifdef USE_AESON instance ToJSON msg => ToJSON (Note msg) where toJSON = \case Note m marks -> object [ "kind" .= ("note" :: String), "message" .= m, "markers" .= fmap showMarker marks ] Hint m marks -> object [ "kind" .= ("hint" :: String), "message" .= m, "markers" .= fmap showMarker marks ] where showMarker (pos, mark) = object $ [ "position" .= pos ] <> case mark of AddCode code m -> [ "kind" .= ("insert" :: String), "message" .= m, "code" .= code ] RemoveCode m -> [ "kind" .= ("delete" :: String), "message" .= m ] Annotate m -> [ "kind" .= ("annotate" :: String), "message" .= m ] #endif -- | Transforms a warning report into an error report. warningToError :: Report msg -> Report msg warningToError (Report Warning code msg markers notes) = Report Error code msg markers notes warningToError r@(Report _ _ _ _ _) = r -- | Transforms an error report into a warning report. errorToWarning :: Report msg -> Report msg errorToWarning (Report Error code msg markers notes) = Report Warning code msg markers notes errorToWarning r@(Report _ _ _ _ _) = r ```

Let me know what you think about it! :)

All the changes are summarized below:

I added a Severity data type (so that we are not limited to the current implementation whenever we want to add new ones);
The Report constructor now takes a Severity;
Marker has been renamed to SimpleMarker (name subject to change, because I don't quite like it);
The Note data type now also takes special markers which embody whether we want to add, remove or simply annotate code;
Translation to JSON for all those new data types.

I chose not to include full reports in notes following what we discussed earlier.

Alongside code restructuring and documentations, I also implemented a basic "grouping" logic, so the following problem is now solved:

Great! This looks way better.

Mesabloo commented 1 year ago

Alright so after a bit of fiddling around, I ended up moving closer to the exact markers you suggested for notes. Basically, I initially thought that adding messages to note markers would be cool, but I came to the conclusion that this does not fit well. Moreover, I'm starting to wonder whether we should allow multiple markers within notes (generally, one would only put a single marker related to the note itself).

In the meantime, I fixed my GCC-style layout to account for the new changes I made. Here is an example output:	With Unicode characters	With ASCII characters

I believe it looks pretty cool already! I will just get rid of the extra indentation before the code in the note.

EDIT:

Moreover, I'm starting to wonder whether we should allow multiple markers within notes (generally, one would only put a single marker related to the note itself).

I have actually made an example which makes multiple markers in a single note look awkward. So I think I'll just allow a single one for now.

EDIT 2: I also updated the TS-like layout:

With Unicode characters	With ASCII characters

ruifengx commented 1 year ago

I copied the code you posted here in diagnose-core, and wrote the logic for rich notes/helps. Here is a screenshot of what it currently looks like (I am pretty satisfied, and I think it is comparable to the output produced by rustc):

Screenshot

The code and the updated test case is available in my fork. I think there are still edge cases that I does not currently handle properly, so there will probably still be some updates, most likely alongside a refactor of sourceLine.

P.S. I find the current interface for Add unintuitive, though. (Maybe I have some misunderstandings?) It has a Position, but (1) the two endpoints must be on the same line, and (2) only the first endpoint is a real location in the source, but the second one is not (only the difference between the two is meaningful and used to determine how long the marker should extend). I still think Add should be something like Add { position :: (Line, Column), text :: String, markerLength :: Int }.

Mesabloo commented 1 year ago

P.S. I find the current interface for Add unintuitive, though. (Maybe I have some misunderstandings?) It has a Position, but (1) the two endpoints must be on the same line, and (2) only the first endpoint is a real location in the source, but the second one is not (only the difference between the two is meaningful and used to determine how long the marker should extend).

The current interface for Add only takes a position and the piece of code to add. However, there are a few things to note:

The position is the position of the marker that will be rendered, which starts on the same codepoint as the code added. It is not required to span only a single line (as mentioned earlier) although I don't really know if it should. From what I could gather updating the layouts I have made, it should (and the added text should not contain newlines).
The piece of code is unfortunately an arbitrary string, meaning it can contain vertical whitespaces. This would break any formatting for markers, unless we explicitely remove newlines in fetchLine. EDIT: we could also replace them with the symbol ␊ (which is the Line Feed symbol), or really any of these: https://unicode-table.com/en/blocks/control-pictures/

I still think Add should be something like Add { position :: (Line, Column), text :: String, markerLength :: Int }.

This would also work. When the Position given is on the same line, it's basically equivalent to this. So the question is only: do we want to support/allow multiline code in such case?

In my opinion, it would be better to disallow these, and perform character substitution when adding code (as described in 2. above), but I would like to have your opinion too. (which I believe I can infer as the same thing, given how you defined the Add marker)

(I am pretty satisfied, and I think it is comparable to the output produced by rustc)

I like it! It's almost as if I can't recognize that this is actually Diagnose rendering these. :D

The code and the updated test case is available in my fork. I think there are still edge cases that I does not currently handle properly, so there will probably still be some updates, most likely alongside a refactor of sourceLine.

I'll most probably make a very big commit here in some time, after finishing updating the original ariadne layout to support markers in notes. We'll soon be able to merge everything in one place I hope!

Mesabloo commented 1 year ago

Just a random thought: we now have markers which can signify adding or removing code (where the former actually adds code). Do we also want to add markers to edit the source code (i.e. change a range to some code)? It's probably a silly idea.

ruifengx commented 1 year ago

Do we also want to add markers to edit the source code (i.e. change a range to some code)?

I am not against this idea, but given the current types of markers, I would expresses this intent using the Annotate marker (disclaimer: this is not an actual message from rustc):

warning: use of deprecated associated function `std::sync::atomic::AtomicUsize::compare_and_swap`
  --> <source>:17:7
   |
17 |     x.compare_and_swap(
   |       ^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(deprecated)]` on by default
help: use `compare_exchange` or `compare_exchange_weak` instead
  --> <source>:17:7
   |
17 |     x.compare_and_swap(
   |       ----------------
   |       compare_exchange
   |       compare_exchange_weak
   |

By the way, would you mind if I introduce transformers as a new dependency? My code could be improved if I could make use of the Writer monad.

Mesabloo commented 1 year ago

By the way, would you mind if I introduce transformers as a new dependency? My code could be improved if I could make use of the Writer monad.

As long as it is not part of the core library, then go ahead! The idea will be to release alternative layouts as separate packages, so more dependencies for a layout package isn't a problem.

I am not against this idea, but given the current types of markers, I would expresses this intent using the Annotate marker (disclaimer: this is not an actual message from rustc): [...]

I see. I wasn't quite sure whether messages were useful on note markers, so I actually removed them (temporarily, I guess) in my own code. But seeing this, I'll make sure to add them back (at least for Annotate, not sure what should be the state for other markers). Also, as far as I know, GCC does the same as you suggest, so I guess an EditCode would not be quite useful (maybe it could even be weird?). Anyways, there are still a few things I need to change then!

Mesabloo commented 1 year ago

Alright so I have pushed all I had on the custom-layouts branch. The code lacks a bit of documentation here and there but for the most part it should be fine (I'll add the missing documentation later probably). I have also added a few test cases just for the sake of it (and also to generate the screenshots I put in the READMEs for each subpackage). And I made a few changes here and there, mostly related to how markers are represented. I also took your advice on the transformers and also used it in the layouts I wrote.

Feel free to give any feedback! It will be greatly appreciated.

Mesabloo commented 1 year ago

Also, I randomly thought about it the other day, but given that all layouts will be released under different packages, perhaps there is no need to include yours inside this repository? You could have it as a separate repository of yours and release the package etc. I mean, it entirely depends on what you want. :)

ruifengx commented 1 year ago

You could have it as a separate repository of yours and release the package etc.

Personally, I prefer merging my code into your repo so that it is easier to find and less likely to go out of sync. But certainly I am happy to maintain this part of code in case future changes make maintenance a non-trivial task.

I have just merged your branch into my fork (well technically it was a rebase), and submitted #14. The current version does not support multiline AddCode markers (those markers are handled by replacing control characters with spaces). This is not ideal, but multiline code insertion is really hard to integrate into my current rendering logic. I will checkout your implementation to get some inspiration.

Comments for your implementation: in my understanding, the following code probably does not work as intended, because this is not an orphan instance, so even if the users write their own implementation, there is no way that instance resolution could prefer the user-defined ones (it won't be more specific than the version here). The only possibility I know is IncoherentInstances, which is very dangerous and does not deterministically resolve the instance to the desired one. https://github.com/Mesabloo/diagnose/blob/2f90444addbbc3992ba04227b3d5089eb37e498d/diagnose-ariadne/src/Error/Diagnose/Layout/Ariadne/Style.hs#L55

Mesabloo commented 1 year ago

The current version does not support multiline AddCode markers (those markers are handled by replacing control characters with spaces). This is not ideal, but multiline code insertion is really hard to integrate into my current rendering logic. I will checkout your implementation to get some inspiration.

I ended up removing that part of the logic. Instead, I am replacing any vertical whitespace with a Unicode symbol (see in fetchLine, this is where this is done). This would probably be better to just replace by an invisible whitespace, but this can still be changed easily if needed.

Comments for your implementation: in my understanding, the following code probably does not work as intended, because this is not an orphan instance, so even if the users write their own implementation, there is no way that instance resolution could prefer the user-defined ones (it won't be more specific than the version here).

I believe that if the user defines a new instance as {-# OVERLAPPING #-}, this will get resolved instead of the already defined one (this is included in the documentation of the module itself). Because my instance is marked as {-# OVERLAPPABLE #-}, any {-# OVERLAPPING #-} instance should be privileged (or at least that's what I think it does). However, I am not quite sure and this needs some testing.

PS: I'll review your PR a bit later. :)