bgamari / ghc-pretty-errors

Design planning for prettier error messages from GHC
8 stars 1 forks source link

first shot at a proposal for the dynamic approach #3

Closed alpmestan closed 4 years ago

alpmestan commented 5 years ago

Rendered

goldfirere commented 5 years ago

This is a nicely written proposal, but I'm afraid I don't see how it enables any of the effects in the "Effects and Interactions" section. The examples in the proposal describe how whole error messages get encapsulated into a type and than given instances. But the first two effects talk about interactions with sub-components of an error message. This would be possible if the IDE/tool rendered its own error messages, but my guess is that many downstream tools will want GHC to provide the text. (This also vastly reduces duplication of error-message text.) Once GHC renders the text, all structure is lost, no?

And how would this design enable deferring expensive analyses? It's not clear to me. Maybe the structure contains a lazy value that is forced later? But the lazy value will likely require some monadic operations, so the design is subtler. I suppose any approach we take here will struggle in this regard, but it's so useful that it might be worth thinking about as we pin down a design.

For the fourth effect, I still think we'd need more structure. For example, perhaps GHC can provide suggestions on many of its error messages (but not all of them). Does that mean we'll soon have a Suggestible class with a getSuggestions method to describe all those messages? That's workable. But now we'll need some kind of runtime instance-lookup the IDE will have to use to retrieve the instance.

On the other hand, I think all of these use-cases (except perhaps the third) are handled nicely by the SDoc a approach: you can embed Types and other objects in messages directly, and you can use (static) types to tell what level of abstraction you're at. For example, we could have SDoc TypeError which means that the SDoc might contain just elements of some datatype. This might be renderable into SDoc TypeElement, where there are now Types and Contexts (or some such) interspersed with text. This might further be renderable to SDoc Void for just plain text. An IDE would use the SDoc TypeElement level to pretty-print syntactic categories or enable queries about, say, types. And it would use the top level (SDoc TypeError) to gather suggestions.

I feel like I must be missing something here, connecting this proposal with its desired effects. What is it?

Thanks!

alpmestan commented 5 years ago

Hello @goldfirere.

This is a nicely written proposal, but I'm afraid I don't see how it enables any of the effects in the "Effects and Interactions" section. The examples in the proposal describe how whole error messages get encapsulated into a type and than given instances. But the first two effects talk about interactions with sub-components of an error message. This would be possible if the IDE/tool rendered its own error messages, but my guess is that many downstream tools will want GHC to provide the text. (This also vastly reduces duplication of error-message text.) Once GHC renders the text, all structure is lost, no?

Well, I think this design is a bit more flexible than it might first appear. Any GhcError value can always be rendered, since it packs an Outputable instance for whatever error type we're using. So GHC itself or GHC API users would always be able to render a given error. Now, since we can also dynamically dispatch using the error type's TypeRep, we are as far as I can tell in fact free to do what we would do with a big sum type and functions that pattern match on specific error constructors, only in a more ugly way, unfortunately. So, to come back to the first two effects and how they have to do with grabbing sub-components on an error message, it should definitely possible to inspect the error and grab the various types, bindings, classes, etc it involves to attach different colors to the respective entities, or to look up the definition of the types in order to make them interactively expandable.

And how would this design enable deferring expensive analyses? It's not clear to me. Maybe the structure contains a lazy value that is forced later? But the lazy value will likely require some monadic operations, so the design is subtler. I suppose any approach we take here will struggle in this regard, but it's so useful that it might be worth thinking about as we pin down a design.

Indeed, thinking about this now is likely worth it. Embedding a monadic action to run to get the analysis result is exactly how I would approach this, yes. What exact monad to involve here is not entirely clear to me yet though. But embedding some action in a fixed monad that the user has means to run from e.g IO should do the trick, no? What potential obstacle do you see there?

For the fourth effect, I still think we'd need more structure. For example, perhaps GHC can provide suggestions on many of its error messages (but not all of them). Does that mean we'll soon have a Suggestible class with a getSuggestions method to describe all those messages? That's workable. But now we'll need some kind of runtime instance-lookup the IDE will have to use to retrieve the instance.

Well, I was more thinking about something like:

data Suggestion = ImportMod Module | AddImport OccName Module | AddLangExt ...

instance Outputable Suggestion where
  ppr = ...

data WithSuggestion e = WithSuggestion e Suggestion

instance Outputable e => Outputable (WithSuggestion e) where
  ppr = ...

instance IsError e => IsError (WithSuggestion e)

This might make it easier for tooling authors to easily capture any sort of error message that comes with an actionable suggestion? Just "catch" anyting that looks like IsError e => WithSuggestion e. Or should it be WithSuggestion GhcError ? It's tricky get the former to work, isn't it?

On the other hand, I think all of these use-cases (except perhaps the third) are handled nicely by the SDoc a approach: you can embed Types and other objects in messages directly, and you can use (static) types to tell what level of abstraction you're at. For example, we could have SDoc TypeError which means that the SDoc might contain just elements of some datatype. This might be renderable into SDoc TypeElement, where there are now Types and Contexts (or some such) interspersed with text. This might further be renderable to SDoc Void for just plain text. An IDE would use the SDoc TypeElement level to pretty-print syntactic categories or enable queries about, say, types. And it would use the top level (SDoc TypeError) to gather suggestions.

I definitely reckon that there is something appealing about those annotated SDocs and how they don't require any Typeable business. But we can sort of mirror this with the approach proposed here, with an obvious but necessary twist: we can't just build errors piece by piece, we have to build an entire error value (a value of a type that implements IsError) and then throw it. What we can still leverage though is the flexibility in how the error value can be put together, and we also get to decide when it gets wrapped with the GhcError constructor. I suspect that we're not currently doing a lot of "piece by piece" error construction, and that when comes the time to emit a given error message, we have all the necessary context around that piece of code, or that we can at least compute it from the things we have in scope. This is the kind of information that we ought to either store in the error values or for which we should offer functions that can compute it from the error values. While this approach puts more burden on IDE/tooling authors (precisely because of the dynamic aspect of it), I think there's room for mitigating the impact by offering well thought out helpers and utilities for achieving those features that we're trying to enable more easily here.

Finally, the overview.mkd document in this very repository lists:

as the drawbacks of the pointed annotations approach.

(Sorry that this comment grew so long, but I feel that this discussion is going to lead to some game-changing updates to the proposal and that taking a bit of time to write down my thoughths was worth it.)

goldfirere commented 5 years ago

I still don't get it. :(

it should definitely possible to inspect the error and grab the various types, bindings, classes, etc it involves to attach different colors to the respective entities, or to look up the definition of the types in order to make them interactively expandable.

I agree that it's possible, by the (somewhat ugly) dynamic matching that a tool could implement. The problem is that you'll get, say, that a expected/actual error involves Int and Bool, but how will you then know which characters in the rendered output to color? The rendered output (assuming the same uninformative SDoc type of today) won't have any information left. So it seems this design forces clients to choose between rich information and GHC-supplied text -- I don't see how to have both (though I do see how to have either).

But embedding some action in a fixed monad that the user has means to run from e.g IO should do the trick, no?

Yes, I suppose so.

WithSuggestion

Yes, I guess that can work. Note that this design does seem to now require a monolithic Suggestion type. Though I suppose you could double-down on the dynamic design and do a class-based approach here, too.

offering well thought out helpers and utilities

Yes, this will be important. That's why I think a "minimum working example" would be nice to look at, just to see where the pain will be.

Sorry that this comment grew so long

Never apologize about this. :)

Agreed that the pointed documents are imperfect.

alpmestan commented 5 years ago

So it seems this design forces clients to choose between rich information and GHC-supplied text -- I don't see how to have both.

That is an excellent point. It does look like whatever road we take, we need annotated documents (whether the annotations "pointed" or "scoped"), to leave some "suspended renderings" all over the document. Unlike in the previous "pointed annotations" proposal, in this case we would emit tiny teeny annotations, like expressions, types, source spans. All those little things that would get a different treatment from the consumer depending on their nature. We would not embed whole errors in documents.

This doesn't preclude dynamic errors, it only tells us we need to be able to emit annotations when we render them. The same thing goes for "static errors" too.

Let me try to sum everything up correctly... There's one thing we need to implement whatever proposal we settle on: annotated documents. But we've then got two choices to make:

While the first choice doesn't seem like big deal (scoped annotations can emulate pointed ones and vice versa, so while I personally find pointed annotations a bit more natural, we could recover them with the other design too), the second is pretty important.

The "static errors" approach has a huge drawback: we would need to introduce .hs-boot files and fight import cycles. But it would be incredibly easy to consume, as easy as it gets really. The dynamic approach on the other hand would be pretty annoying for the consumer side of things (Typeable-based dispatch), but would not require any kind of import cycle fighting: ErrUtils would define IsError/GhcError and other support code, and each subsystem would define its error types and throw them wrapped in GhcError.

In both cases, we'd throw an error value, then later render it to a document with tiny error message items as annotations (types, expressions, ...), probably in an OutputableError class, and then the final consumer (GHC executable, GHC API users) would render those annotations as completely textual, annotation-free documents.

goldfirere commented 5 years ago

Thanks for that post -- it clarifies something I hadn't quite realized, that the choice between dynamic/static is separate from the annotation question. We are now in agreement that we need annotations. I, too, am on the fence about scoped/pointed, and I don't see a good way to choose between these without a working example.

One image I have around all of this is that rendering will be recursive. In other words, given an annotation of type Ann, I would expect a function Ann -> SDoc Ann somewhere. (Maybe the type of the annotation changes. Not sure.) That way, a client can choose to render as much as they like (but no more). This might be useful, for example, when including contextual information in errors. The first render could include the error text but leave the contexts as annotations. Then the contexts could be rendered, exposing expressions and types. Those could be rendered exposing atoms (tycons, tyvars, vars, operators, etc.). And those could be rendered giving a vanilla doc. (This suggests that each successive rendering does advance the type of the annotation. But doing so means that the number of steps must be chosen once, statically, and I don't know whether I like that.)

I think it's also helpful to think about the fact that the static/dynamic distinction can also be deferred: we can embed the dynamic idea in the static one (SDoc GhcError) and the static one in the dynamic one (only one instance of IsError). Of course, this means that we can take a hybrid approach. If the downside of the static approach is .hs-boot files, maybe we can mitigate that by defining separate static types for each major subcomponent of GHC, and then union these types dynamically. For example, I imagine by the time we get to TcRnTypes, we won't need any more import cycles for the type-checker. In any case, I've never been very bothered by .hs-boot files: yes they're annoying and yes we should avoid where possible, but I don't think they should determine what design we foist on clients. (Others may reasonably disagree here.)

alpmestan commented 5 years ago

Regarding the two options for annotations, here's a self-contained module that summarizes the two approaches and shows how close they are, for a simplified version of the document types, with strings, an above construct, an empty document one and something to leave annotations in the document. Note: the scoped approach doesn't seem capable of implementing a proper Monad instance...? Unless I'm missing something.

import Data.Monoid

-- | Documents with pointed annotations
data PointedDoc a
  = PText String
  | PAbove (PointedDoc a) (PointedDoc a)
  | Pure a -- <- annotation constructor
  | PEmpty

instance Functor PointedDoc where
  fmap f (PAbove a b) = PAbove (fmap f a) (fmap f b)
  fmap f (Pure a) = Pure (f a)
  fmap _ (PText s) = PText s
  fmap _ PEmpty = PEmpty

instance Applicative PointedDoc where
  pure = ann
  (<*>) = ap

instance Monad PointedDoc where
  return = pure
  d >>= f = case d of
    Pure a -> f a
    PText s -> PText s
    PAbove a b -> PAbove (a >>= f) (b >>= f)
    PEmpty -> PEmpty

instance Foldable PointedDoc where
  foldMap f d = case d of
    PEmpty -> mempty
    Pure a -> f a
    PText s -> mempty
    PAbove a b -> foldMap f a <> foldMap f b

instance Traversable PointedDoc where
  traverse f d = case d of
    PEmpty -> pure PEmpty
    Pure a -> f a
    PText s -> pure (PText s)
    PAbove a b -> PAbove <$> traverse f a <*> traverse f b

---

-- | Documents with scoped annotations
data ScopedDoc a
  = SText String
  | SAbove (ScopedDoc a) (ScopedDoc a)
  | Scoped a (ScopedDoc a) -- <- annotation constructor
  | SEmpty

instance Functor ScopedDoc where
  fmap _ (SText s) = SText s
  fmap _ SEmpty = SEmpty
  fmap f (Scoped a d) = Scoped (f a) (fmap f d)
  fmap f (SAbove a b) = SAbove (fmap f a) (fmap f b)

instance Applicative ScopedDoc where
  pure = ann
  (<*>) = ap -- might be bogus because of the bad >>= impl below

-- !!! this instance has a problem, see below
instance Monad ScopedDoc where
  return = pure
  d >>= f = case d of
    SText s -> SText s
    SEmpty -> SEmpty
    SAbove a b -> SAbove (a >>= f) (b >>= f)

    -- PROBLEM here: this seems to be the only reasonable thing
    -- we can do, but we therefore completely drop the old
    -- subdocument, which is not okay.
    Scoped a d -> f a

instance Foldable ScopedDoc where
  foldMap f d = case d of
    SText _ -> mempty
    SEmpty -> mempty
    SAbove a b -> foldMap f a <> foldMap f b
    Scoped a d -> f a <> foldMap f d

instance Traversable ScopedDoc where
  traverse f d = case d of
    SText s -> pure (SText s)
    SEmpty -> pure SEmpty
    SAbove a b -> SAbove <$> traverse f a <*> traverse f b
    Scoped a d -> Scoped <$> f a <*> traverse f d

---

-- primitive document operations
class Doc d where
  empty :: d a
  text :: String -> d a
  above :: d a -> d a -> d a

instance Doc PointedDoc where
  empty = PEmpty
  text = PText
  above = PAbove

instance Doc ScopedDoc where
  empty = SEmpty
  text = SText
  above = SAbove

---

-- documents with "pointed" annotations (annotations at the leaves of the doc)
class Doc d => PAnnDoc d where
  ann :: a -> d a

instance PAnnDoc PointedDoc where
  ann = Pure

instance PAnnDoc ScopedDoc where
  ann x = Scoped x empty

---

-- documents with "scoped" annotations (around a sub-document)
class Doc d => SAnnDoc d where
  scopedAnn :: a -> d a -> d a

instance SAnnDoc ScopedDoc where
  scopedAnn = Scoped

newtype PointedScopedDoc a = PointedScopedDoc (PointedDoc (a, PointedScopedDoc a))

psdoc :: PointedScopedDoc a -> PointedDoc (a, PointedScopedDoc a)
psdoc (PointedScopedDoc d) = d

instance Doc PointedScopedDoc where
  empty = PointedScopedDoc PEmpty
  text = PointedScopedDoc . text
  above a b = PointedScopedDoc $ above (psdoc a) (psdoc b)

instance SAnnDoc PointedScopedDoc where
  scopedAnn ann subdoc = PointedScopedDoc $ Pure (ann, subdoc)

The Monad instance in particular let's you apply your Ann -> SDoc Ann functions, or Ann1 -> SDoc Ann2 in the general case.


Regarding your hybrid approach suggestion, that might be nice! Consumers would only have to dynamically dispatch on a few "categories" of errors, and for the rest of their code work with the sum types of the various subsystems. It definitely seems better than the full-on dynamic approach.

goldfirere commented 5 years ago

That's very helpful -- thanks!

Your observation about the lack of a Monad instance for ScopedDoc is apt. Here's another way to look at it: join makes sense for PointedDoc, by inlining. (That is, a PointedDoc (PointedDoc a) can easily inline one level of annotation to get a PointedDoc a.) But join makes no sense for a ScopedDoc. If we can't choose between them otherwise, this might push us in the direction of pointedness (which I think we were leaning towards anyway).

We seem to be converging a bit on the two big issues being discussed:

Open question: how to render an annotated doc? Maybe we need something like

class DocAnnotation a where
  type RenderedAnnotation a
  render :: a -> SDoc (RenderedAnnotation a)

Would that work at all? I'm keen to have a way to take, say, a SDoc TypeError and turn it into a SDoc TypeElement where the former has structured error information while the latter just has structured regions in more of a sea of GHC-supplied text.

alpmestan commented 5 years ago

Your join based reasoning is interesting too. And if only for convenience and simplicity (we all know and love Functor, Applicative, Monad and friends), I too would lean towards the pointed approach.

More generally, even after taking a bit of time to let my thoughts settle down, I think we discussed our way towards what seems to me like the best way forward (pointed annotations, hybrid static/dynamic errors where the dynamic part lets us dispatch on the "subsystem" from which the error comes and where we can then just pattern match on the particular errors of a given subsystem to handle them).

Open question: how to render an annotated doc? [...]

As discussed previously, the general mechanism for rendering annotations (either to finer-grained ones or to a fully textual document) would be >>= :: SDoc a -> (a -> SDoc b) -> SDoc b. Your class would basically be about supplying a fixed b type and a fixed a -> SDoc b function for a given type a. While this might work for the b types (to map a unique b to an a), it probably won't do for the functions, at least at the very end of the pipeline, when we go from SDoc TypeElement (to reuse the names from your last comment) to fully textual. GHC would do something straightforward there while tooling folks will want to do something fancy with the annotations.

So I'm not sure how desirable this class is. I don't think it'd cause any particular problem either though. Something that seems more important to figure out, to me, is what to change LogAction to. It currently takes an MsgDoc, which is a synonym for our current SDoc, which would be SDoc Void with our annotated documents. Do we want to leave it this way, or switch it to become SDoc TypeElement ? SDoc GhcError (the dynamic wrapper) ? Probably one of the last two solutions.

david-christiansen commented 4 years ago

It seems to me that what you're calling scoped vs pointed annotations are not really design alternatives to each other, because they don't really serve the same purposes as far as I can see. The Scoped variant is useful for tracking the semantics of a syntactic representation that has resulted from pretty printing, while the Pointed is useful for embedding semantic objects directly into the pretty-printed document, for later rendering. Given this understanding, it makes perfect sense not to have a Monad instance for ScopedDoc, because there's not a notion of substitution occurring. On the other hand, it might make sense to have both options, or to use the scoped version during rendering from the pointed version.

What would the API look like for getting the semantics of pretty-printed text with the Pointed version? One of the most useful things we have in the Idris version is a renderer that has roughly the type render : Doc ann -> (String, [(Int, Int, ann)]), where second part of the output are offset-length-meaning triples. This has proven to be extremely useful when making tooling that doesn't link to the compiler, because we don't have to guess ahead of time what information might be usable, so long as ann is a type that can be serialized. And I think it's important to be able to have tool support that doesn't link to the compiler, because it's much, much easier to distribute and maintain and often easier to use in new contexts.

Some other things that we have used this mechanism for in Idris include text formatting information about documentation (e.g. bold markup in a docstring gives bold text when using :doc), out-of-band information in documentation (like type error messages in examples), and annotated non-error output from the compiler (like hyperlinked output from :t). Are these use cases plausibly supported here?