dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.37k stars 9.99k forks source link

Blazor "Error boundaries" design proposal #30940

Closed SteveSandersonMS closed 3 years ago

SteveSandersonMS commented 3 years ago

Summary

Provide a mechanism to catch and handle exceptions thrown within a particular UI subtree.

Motivation and goals

Currently, we say that unhandled exceptions from components are fatal, because otherwise the circuit is left in an undefined state which could lead to stability or security problems in Blazor Server. There is a mechanism for displaying a terminal error UI to the end user, on by default in the template.

However many developers regard this as inadequate, leading to many requests for "global exception handling". The main point is that it's unrealistic to guarantee that no component throws from a lifecycle method or rendering, especially given large developer teams and use of 3rd-party component libraries. When inevitable unhandled exceptions occur, people are unsatisfied with:

Other SPA frameworks support more granular mechanisms for global and semi-global error handling. We would like to do something inspired by the error boundaries feature in React.

Goals:

Non-goals:

In scope

Defining one or more "error boundary" regions within .razor code that handle exceptions from their child content. Each error boundary:

Error boundaries are not expected to catch all possible exceptions. We're trying to cover ~90% of the most common scenarios, such as rendering and lifecycle methods. We are only trying to catch errors that we know for sure are recoverable (i.e., can't corrupt framework state). See the detailed design below for more info. In any other case, the existing mechanisms for terminating the circuit and showing the terminal error UI will continue to apply.

Out of scope

Risks / unknowns

Risk: Developers might think that all possible exceptions will be caught and become recoverable. This doesn't put server stability at risk, but just means the developer will be surprised when the terminal error UI still can appear.

Risk: Developers might disagree with the design about which specific error types should be recoverable.

Risk: Developers might think they don't need to customize terminal error UI any more. It may be confusing that there are multiple different error UIs to customize (one per error boundary, and a final terminal one).

Risk: If errors are recoverable, unwise use of this feature could lead to an infinite error loop. For example, an error boundary might immediately try to re-render the same child content after an error, which immediately (or asynchronously, after some async I/O) throws again.

Unknown: What happens if the error boundary itself has an unhandled error? See detailed design for a proposal. The risk here is that it's nonobvious.

Risk: If a large number of errors occur all at once within different error boundaries (e.g., if there's a separate boundary around each row in a grid, and every row throws), the log may be swamped with errors.

Risk: If we have to put in new try/catch blocks within the low-level rendering internals (and we will), this might adversely affect perf when running on WebAssembly, perhaps especially in AoT mode, because (IIRC) exceptions are implemented by calling out to JavaScript which then calls back into .NET. The cost would likely be paid by everyone, whether or not they are using error boundaries. We'll have to measure this.

Unknown: What are we putting in the default template? Should there be any error boundaries?

Risk: If there is an error boundary in the default template, or as soon as a developer adds one to their site, they are implicitly promising that any code they call from their lifecycle methods can tolerate unhandled exceptions. For example, any circuit-scoped model state no longer gets discarded at the first unhandled exception. What if it gets into a broken/illegal state and the circuit is still trying to use it? Developers need to understand what they are opting into. This is the same as with any state that spans multiple requests in MVC.

Risk: Naive developers might routinely mark all their components as being an error boundary and simply ignore exceptions. Again, this doesn't put framework internal stability at risk, but means the application will just not do anything sensible to notify about or log exceptions. Maybe we should put in some kind of roadblock to make error boundary components a special thing, not just a behavior you can trivially mix into any other component - not sure how though.

Risk: Whatever default error UI we bake in, developers might take a dependency on its exact appearance and behaviors (e.g., resizing to fit content or not) forever. Will we ever be able to change the default error UI? Is it enough to document that the error UI should be customized and if you don't, you accept that it might just change in new framework versions? See detailed design for proposal.

Examples

Page-level error boundary in MainLayout.razor:

<div class="main">
    <div class="content px-4">
        <ErrorBoundary @ref="errorBoundary">
            @Body
        </ErrorBoundary>
    </div>
</div>

@code {
    ErrorBoundary errorBoundary;

    protected override void OnParametersSet()
    {
        // On each page navigation, reset any error state
        errorBoundary?.Recover();
    }
}

Error boundary around a component that sometimes fails due to bad data:

@foreach (var item in SomeItems)
{
    <div class="item">
        <ErrorBoundary @key="item">
            <ChildContent>
                <MyItemDisplay Item="@item" />
            </ChildContent>
            <ErrorContent>
                <p class="my-error">Sorry, we can't display @item.ItemId due to a technical error</p>
            </ErrorContext>
        </ErrorBoundary>
    </div>
}
SteveSandersonMS commented 3 years ago

Detailed design

How do we decide what kinds of errors should be recoverable? How do we avoid the risk that this feature creates new ways to corrupt framework state and introduce stability/security issues?

I propose we think about it as follows: we should only do things that are equivalent to what you could already have done in user code. There's an error-handling pattern you could already have done; it's just really inconvenient. I propose we take that pattern and automate it. Then it doesn't create any entirely new security/stability risks.

An existing error handling pattern

Imagine if a developer did the following:

  1. Define a component, <ErrorBoundary>, which takes child content. It uses a <CascadingValue> to advertise itself to descendants.
  2. In every descendant, you receive the ErrorBoundary as a cascaded parameter. And, in every lifecycle method and event handler, you wrap all your code in a try/catch. If there's an exception, you simply call some method HandleError(exception) on the ErrorBoundary and return.
  3. The ErrorBoundary ancestor receives the HandleError call and responds by re-rendering itself. This time instead of rendering ChildContent, it renders some other ErrorContent.

What about exceptions during rendering? This is debatable, but technically nothing stops you today from doing a try/catch around the rendering logic. If there's an exception, as well as calling HandleError, you could call __builder.Clear() so the builder is left in a valid state. So that's technically handleable today already.

That's pretty much it. These exceptions are no longer fatal, because the framework never sees them. The lifecycle methods and event handlers never throw. Framework internal state is not at risk in any way.

Automating this error handling pattern

Of course the primary drawback to this pattern is that it's ridiculously inconvenient to put try/catch around every single method, and there would be perf drawbacks to passing around cascaded parameters to every single component. But if we bake this pattern into Renderer, then neither the try/catch blocks nor the cascading parameters would be needed.

The key principle, then is:

What's in and what's out

We will support catching and recovery from exceptions from:

Although the above is phrased in terms of ComponentBase APIs, the renderer really does it in terms of IComponent/IHandleEvent, and IHandleAfterRender, so it works with any IComponent.

We will not support catching and recovery of exceptions from:

Error boundary components

There will need to be different error boundary components for different hosting models. This is because the default error UI has to vary (you can only use HTML elements in web rendering; you'd do something else in MBB). So there has to be some public API you can implement/subclass to define an error boundary component.

However, we don't want just any component to be able to mark itself as an error boundary, as this would likely lead to antipatterns like marking all components as error boundaries (in an attempt to create On-Error-Resume-Next-style semantics). Also at least at initially, we should impose some clear behavioral rules on what error boundaries do when there is an error.

So, I propose the following:

More explicitly, these types look like:

// ---------------------------------------------------------
// In Microsoft.AspNetCore.Components

namespace Microsoft.AspNetCore.Components
{
    internal interface IErrorBoundary
    {
        void HandleException(Exception exception);
    }

    public abstract class ErrorBoundaryBase : ComponentBase, IErrorBoundary
    {
        // ... implement stuff ...
        public void Recover() {  }
        protected abstract Task OnErrorAsync(Exception exception);
    }
}

// ---------------------------------------------------------
// In Microsoft.AspNetCore.Components.Web

namespace Microsoft.AspNetCore.Components.Web
{
    // This is what you actually use if you're doing web-based rendering.
    // Mobile Blazor Bindings would have its own equivalent to this that uses native UI elements for the default error UI
    public class ErrorBoundary : ErrorBoundaryBase { ... }
}

Developers can create a custom subclass of ErrorBoundary to define an alternate default error UI. Then will need to recreate the basic logic for showing either the ChildContent or the error UI, e.g.:

@inherits ErrorBoundary

@if (CurrentException is null)
{
    @ChildContent
}
else if (ErrorContent is not null)
{
    @ErrorContent(CurrentException)
}
else
{
    <div class="my-custom-error-ui">Oh no, there was a problem!</div>
}

Custom error boundary classes could include logic like a "retry" button (which calls the base class Recover method) or custom logging logic.

Alternative considered: I have considered having IErrorBoundary be public. Maybe we'll do that eventually. But I'm not keen on taking a gamble that there's no need to impose any standard expectations on what error boundaries should do in response to an error, as it could easily lead to all errors being suppressed and inefficient things like every single component also being an error boundary. The ErrorBoundaryBase is where the important infinite-error-loop detection logic lives, so it's helpful that this will always be in effect.

Semantics

Many aspects of the following are quite subtle and still open for debate.

How do we clean up the failed subtree?

We could rely on the IErrorBoundary component tearing down its child content after an error, thereby triggering all the usual disposal and cleanup logic (invoking Dispose on all components in the subtree, unregistering all their event handlers, notifying the JS client to unregister event listeners, etc.). This is something that ErrorBoundaryBase will do.

However if an IErrorBoundary fails to do that, we'd create a strange situation where "failed" components continue to live and receive events. This shouldn't put the framework state at any risk (as per the principles above) but is quite unexpected. The worst problem with it, I think, is that it would prevent us from ever making this stricter, as people may depend on failed components continuing to operate.

To avoid the risk that one day, we try to make IErrorBoundary more open but don't realise we need to guarantee the subtree must be disposed on error, I propose that when the Renderer will notify an error boundary about an error, it first forcibly discards that entire subtree (whether the IErrorBoundary wants to or not). This will ensure all the cleanup logic occurs in that subtree, independently of whatever content the error boundary will render next.

If the IErrorBoundary tries to re-render its existing ChildContent, that will be fine - it just means you now have an entirely new copy of those components.

Recovery from errors

The default ErrorBoundaryBase component will let the developer call a Recover method which makes it switch back to render its ChildContent instead of any error content. This is what makes possible the following simple and intuitive per-page error handling system:

<ErrorBoundary @ref="errorBoundary">
    @Body
</ErrorBoundary>

@code {
    ErrorBoundary errorBoundary;

    protected override void OnParametersSetAsync()
    {
        errorBoundary?.Recover();
    }
}

The end user can then click on a different page in a nav menu and carry on with their circuit. They can even come back to the same page that previously failed, and will get a fresh copy of it.

What happens if the error boundary itself fails? Can we get infinite error loops?

An IErrorBoundary needs to be able to handle errors that occur directly within its own ChildContent fragment, not just in further-down descendants. For example, you might have:

<ErrorBoundary>Some markup that might throw a nullref</ErrorBoundary>

Do you expect the error boundary to catch the nullref? Of course you do. So we can't have the rule that error boundaries only catch things from their descendants. Error boundaries themselves must continue to operate after an error in their own direct content. That's yet another reason why error boundaries should be special things and not just any old component, since we don't want just any old component to continue operating after it has an exception in its own rendering.

Does this create a risk of infinite error loops? Why yes it does. If ErrorContent throws during rendering, you'll keep re-rendering it. We can't know whether an exception comes from something in ErrorContent (versus ChildContent), because exceptions can occur after asynchronous lifecycle methods or event handlers, and hence might arrive after we've already switched into or out of an error state.

The best way to mitigate this, I think, is a simple "error counting" mechanism for detecting infinite error loops. Hopefully most developers will never encounter this, but if more than N (default = 100, but configurable) exceptions are received in between recoveries, the error boundary will treat the new error as fatal to the circuit. This is helpful because:

  1. It makes it easy to spot basic cases of infinite error loops
  2. It exposes the case where some child component could continue to trigger errors forever, e.g., in response to a timer. This would be super inefficient, so we really do want the developer to notice this and do something about it.

What gets logged, to where, and when?

For web-based rendering, I think this needs to vary across hosting models:

I propose a new DI service to supply these per-hosting-model behaviors:

namespace Microsoft.AspNetCore.Components.Web
{
    public interface IErrorBoundaryLogger
    {
        ValueTask LogErrorAsync(Exception exception);
    }
}

Notice that this is specific to Microsoft.AspNetCore.Components.Web. It's not a concept in the core at all.

Default project template

After some consideration, I suggest we do not include any ErrorBoundary in the default template. This is because:

More generally, error boundaries don't make applications more robust. They just make some errors recoverable. The existing default behavior of forcing a circuit to restart / page to reload on exception is already a good behavior that's easy to reason about.

I'd be in favour of documenting how you can do per-page error boundaries as a trivial tweak in MainLayout.razor, but we should include cautions about what this entails about the user being able to continue after any lifecycle exception anywhere.

Errors after disposal

This is subtle! Since exceptions can occur asynchronously in lifecycle methods or event handlers, they might not happen until after the component (or even its enclosing error boundary) has been removed from the UI and disposed. How should we handle the errors then?

  1. We could say that there's no longer an error boundary to route them to, so they are fatal.
    • Bad choice! This means that a normally-handleable error suddenly becomes fatal to the circuit just because the user happened to navigate to a different page while some operation was in flight. This could be really common for anything like HttpClient requests that time out.
  2. We could say that since the component is gone, we don't care about its errors any more, so just ignore them.
    • Also, bad choice! Async errors are currently fatal (i.e., before this feature). If we just started ignoring them, we'd be significantly loosening the whole system by default, even for people who are not using error boundaries.
  3. We could try to behave as if the component/errorboundary had not been disposed.

I think (3) is the only good choice. This is tricky to implement because we have to keep track of more info inside the Renderer than we were doing before. Using only the IComponent Receiver on an event, we have to be able to walk its original ancestry even after disposal. It is possible to do this, fortunately. The result is:

Default error content

We don't want to bake in any human-readable default error UI because:

So instead, the default error UI is simply: <div class="blazor-error-boundary"></div>. The default project template uses CSS to add sensible colors, text, and even a warning icon:

image

Since this is purely in CSS in user code, the framework isn't responsible for the language or any back-compatibility across versions.

HaoK commented 3 years ago

Looks great overall, I wonder if there could be a more descriptive name other than AutoReset="true". Its not clear what is automatically getting reset or to what state, not suggesting this as a name, but its going to logically be doing something like ShowChildContentOnRecovery? And what happens if AutoReset is false (and why would someone want to do this, would they have to manually chose what to rerender?)