Closed SteveSandersonMS closed 3 years ago
How do we decide what kinds of errors should be recoverable? How do we avoid the risk that this feature creates new ways to corrupt framework state and introduce stability/security issues?
I propose we think about it as follows: we should only do things that are equivalent to what you could already have done in user code. There's an error-handling pattern you could already have done; it's just really inconvenient. I propose we take that pattern and automate it. Then it doesn't create any entirely new security/stability risks.
Imagine if a developer did the following:
<ErrorBoundary>
, which takes child content. It uses a <CascadingValue>
to advertise itself to descendants.ErrorBoundary
as a cascaded parameter. And, in every lifecycle method and event handler, you wrap all your code in a try
/catch
. If there's an exception, you simply call some method HandleError(exception)
on the ErrorBoundary
and return.ErrorBoundary
ancestor receives the HandleError
call and responds by re-rendering itself. This time instead of rendering ChildContent
, it renders some other ErrorContent
.What about exceptions during rendering? This is debatable, but technically nothing stops you today from doing a try
/catch
around the rendering logic. If there's an exception, as well as calling HandleError
, you could call __builder.Clear()
so the builder is left in a valid state. So that's technically handleable today already.
That's pretty much it. These exceptions are no longer fatal, because the framework never sees them. The lifecycle methods and event handlers never throw. Framework internal state is not at risk in any way.
Of course the primary drawback to this pattern is that it's ridiculously inconvenient to put try
/catch
around every single method, and there would be perf drawbacks to passing around cascaded parameters to every single component. But if we bake this pattern into Renderer
, then neither the try
/catch
blocks nor the cascading parameters would be needed.
The key principle, then is:
try
/catch
around the call into user code. And only around the single line of that call - not around any other framework code. This ensures it's equivalent to the developer putting a try
/catch
around their own code, and doesn't risk leaving any framework state in an incomplete place.IServiceProvider.GetService
to return something, and it does require IComponentActivator
to return a valid component, so exceptions from those places aren't applicable to route to an error boundary.We will support catching and recovery from exceptions from:
OnInitialized[Async]
, OnParametersSet[Async]
, and OnAfterRenderAsync
)BuildRenderTree
)Although the above is phrased in terms of ComponentBase
APIs, the renderer really does it in terms of IComponent
/IHandleEvent
, and IHandleAfterRender
, so it works with any IComponent
.
We will not support catching and recovery of exceptions from:
IComponentActivator
, or IComponent.Attach
.
Dispose
/DisposeAsync
)
try
/catch
inside disposal logic if it's actually recoverable.IHandleEvents
and the target is an IComponent
. Otherwise, we have no idea which component to blame.InvokeAsync
InvokeVoidAsync
that does its own error handling, then we would catch those exceptions in the enclosing error boundary.System.Timer
.There will need to be different error boundary components for different hosting models. This is because the default error UI has to vary (you can only use HTML elements in web rendering; you'd do something else in MBB). So there has to be some public API you can implement/subclass to define an error boundary component.
However, we don't want just any component to be able to mark itself as an error boundary, as this would likely lead to antipatterns like marking all components as error boundaries (in an attempt to create On-Error-Resume-Next-style semantics). Also at least at initially, we should impose some clear behavioral rules on what error boundaries do when there is an error.
So, I propose the following:
Renderer
knows only about some new internal interface IErrorBoundary
Microsoft.AspNetCore.Components
, there's a standard base class public abstract class ErrorBoundaryBase : IErrorBoundary, IComponent
. This contains the common error handling semantics that we'd expect to be the same across all rendering platforms. For example, on error it logs errors, and shows some ErrorContent
or default error UI.Microsoft.AspNetcore.Components.Web
, there's the actual component developers consume, public class ErrorBoundary : ErrorBoundaryBase
which overrides abstract base class methods to:
More explicitly, these types look like:
// ---------------------------------------------------------
// In Microsoft.AspNetCore.Components
namespace Microsoft.AspNetCore.Components
{
internal interface IErrorBoundary
{
void HandleException(Exception exception);
}
public abstract class ErrorBoundaryBase : ComponentBase, IErrorBoundary
{
// ... implement stuff ...
public void Recover() { }
protected abstract Task OnErrorAsync(Exception exception);
}
}
// ---------------------------------------------------------
// In Microsoft.AspNetCore.Components.Web
namespace Microsoft.AspNetCore.Components.Web
{
// This is what you actually use if you're doing web-based rendering.
// Mobile Blazor Bindings would have its own equivalent to this that uses native UI elements for the default error UI
public class ErrorBoundary : ErrorBoundaryBase { ... }
}
Developers can create a custom subclass of ErrorBoundary
to define an alternate default error UI. Then will need to recreate the basic logic for showing either the ChildContent
or the error UI, e.g.:
@inherits ErrorBoundary
@if (CurrentException is null)
{
@ChildContent
}
else if (ErrorContent is not null)
{
@ErrorContent(CurrentException)
}
else
{
<div class="my-custom-error-ui">Oh no, there was a problem!</div>
}
Custom error boundary classes could include logic like a "retry" button (which calls the base class Recover
method) or custom logging logic.
Alternative considered: I have considered having IErrorBoundary
be public. Maybe we'll do that eventually. But I'm not keen on taking a gamble that there's no need to impose any standard expectations on what error boundaries should do in response to an error, as it could easily lead to all errors being suppressed and inefficient things like every single component also being an error boundary. The ErrorBoundaryBase
is where the important infinite-error-loop detection logic lives, so it's helpful that this will always be in effect.
Many aspects of the following are quite subtle and still open for debate.
We could rely on the IErrorBoundary
component tearing down its child content after an error, thereby triggering all the usual disposal and cleanup logic (invoking Dispose
on all components in the subtree, unregistering all their event handlers, notifying the JS client to unregister event listeners, etc.). This is something that ErrorBoundaryBase
will do.
However if an IErrorBoundary
fails to do that, we'd create a strange situation where "failed" components continue to live and receive events. This shouldn't put the framework state at any risk (as per the principles above) but is quite unexpected. The worst problem with it, I think, is that it would prevent us from ever making this stricter, as people may depend on failed components continuing to operate.
To avoid the risk that one day, we try to make IErrorBoundary
more open but don't realise we need to guarantee the subtree must be disposed on error, I propose that when the Renderer
will notify an error boundary about an error, it first forcibly discards that entire subtree (whether the IErrorBoundary
wants to or not). This will ensure all the cleanup logic occurs in that subtree, independently of whatever content the error boundary will render next.
If the IErrorBoundary
tries to re-render its existing ChildContent
, that will be fine - it just means you now have an entirely new copy of those components.
The default ErrorBoundaryBase
component will let the developer call a Recover
method which makes it switch back to render its ChildContent
instead of any error content. This is what makes possible the following simple and intuitive per-page error handling system:
<ErrorBoundary @ref="errorBoundary">
@Body
</ErrorBoundary>
@code {
ErrorBoundary errorBoundary;
protected override void OnParametersSetAsync()
{
errorBoundary?.Recover();
}
}
The end user can then click on a different page in a nav menu and carry on with their circuit. They can even come back to the same page that previously failed, and will get a fresh copy of it.
An IErrorBoundary
needs to be able to handle errors that occur directly within its own ChildContent
fragment, not just in further-down descendants. For example, you might have:
<ErrorBoundary>Some markup that might throw a nullref</ErrorBoundary>
Do you expect the error boundary to catch the nullref? Of course you do. So we can't have the rule that error boundaries only catch things from their descendants. Error boundaries themselves must continue to operate after an error in their own direct content. That's yet another reason why error boundaries should be special things and not just any old component, since we don't want just any old component to continue operating after it has an exception in its own rendering.
Does this create a risk of infinite error loops? Why yes it does. If ErrorContent
throws during rendering, you'll keep re-rendering it. We can't know whether an exception comes from something in ErrorContent
(versus ChildContent
), because exceptions can occur after asynchronous lifecycle methods or event handlers, and hence might arrive after we've already switched into or out of an error state.
The best way to mitigate this, I think, is a simple "error counting" mechanism for detecting infinite error loops. Hopefully most developers will never encounter this, but if more than N (default = 100, but configurable) exceptions are received in between recoveries, the error boundary will treat the new error as fatal to the circuit. This is helpful because:
For web-based rendering, I think this needs to vary across hosting models:
ErrorContent
or ErrorComponentBase
.ILogger
, and DetailedErrors
-respecting info to the client-side consoleILogger
. There is no client we can log to.I propose a new DI service to supply these per-hosting-model behaviors:
namespace Microsoft.AspNetCore.Components.Web
{
public interface IErrorBoundaryLogger
{
ValueTask LogErrorAsync(Exception exception);
}
}
Notice that this is specific to Microsoft.AspNetCore.Components.Web
. It's not a concept in the core at all.
After some consideration, I suggest we do not include any ErrorBoundary
in the default template. This is because:
More generally, error boundaries don't make applications more robust. They just make some errors recoverable. The existing default behavior of forcing a circuit to restart / page to reload on exception is already a good behavior that's easy to reason about.
I'd be in favour of documenting how you can do per-page error boundaries as a trivial tweak in MainLayout.razor
, but we should include cautions about what this entails about the user being able to continue after any lifecycle exception anywhere.
This is subtle! Since exceptions can occur asynchronously in lifecycle methods or event handlers, they might not happen until after the component (or even its enclosing error boundary) has been removed from the UI and disposed. How should we handle the errors then?
HttpClient
requests that time out.I think (3) is the only good choice. This is tricky to implement because we have to keep track of more info inside the Renderer
than we were doing before. Using only the IComponent Receiver
on an event, we have to be able to walk its original ancestry even after disposal. It is possible to do this, fortunately. The result is:
try
/catch
, or an extra layer of error boundary).OnErrorAsync
on it so it can log as usual, but the fact that it tries to switch into an error UI state is irrelevant and has no visible effect, just like any other component trying to render after its was removed from the component hierarchy. This very likely matches what the developer expects and wants.We don't want to bake in any human-readable default error UI because:
DetailedErrors
is true, you still don't want to show technical details in the UI in production (e.g., on WebAssembly). Developers who really want to see stack traces in development should probably use some logic like #if DEBUG
inside their ErrorContent
or a custom error boundary subclass - we can't do that for them.So instead, the default error UI is simply: <div class="blazor-error-boundary"></div>
. The default project template uses CSS to add sensible colors, text, and even a warning icon:
Since this is purely in CSS in user code, the framework isn't responsible for the language or any back-compatibility across versions.
Looks great overall, I wonder if there could be a more descriptive name other than AutoReset="true". Its not clear what is automatically getting reset or to what state, not suggesting this as a name, but its going to logically be doing something like ShowChildContentOnRecovery? And what happens if AutoReset is false (and why would someone want to do this, would they have to manually chose what to rerender?)
Summary
Provide a mechanism to catch and handle exceptions thrown within a particular UI subtree.
Motivation and goals
Currently, we say that unhandled exceptions from components are fatal, because otherwise the circuit is left in an undefined state which could lead to stability or security problems in Blazor Server. There is a mechanism for displaying a terminal error UI to the end user, on by default in the template.
However many developers regard this as inadequate, leading to many requests for "global exception handling". The main point is that it's unrealistic to guarantee that no component throws from a lifecycle method or rendering, especially given large developer teams and use of 3rd-party component libraries. When inevitable unhandled exceptions occur, people are unsatisfied with:
Other SPA frameworks support more granular mechanisms for global and semi-global error handling. We would like to do something inspired by the error boundaries feature in React.
Goals:
Non-goals:
In scope
Defining one or more "error boundary" regions within
.razor
code that handle exceptions from their child content. Each error boundary:ILogger
, plus sendDetailedErrors
-respecting info to the client console.Error boundaries are not expected to catch all possible exceptions. We're trying to cover ~90% of the most common scenarios, such as rendering and lifecycle methods. We are only trying to catch errors that we know for sure are recoverable (i.e., can't corrupt framework state). See the detailed design below for more info. In any other case, the existing mechanisms for terminating the circuit and showing the terminal error UI will continue to apply.
Out of scope
Risks / unknowns
Risk: Developers might think that all possible exceptions will be caught and become recoverable. This doesn't put server stability at risk, but just means the developer will be surprised when the terminal error UI still can appear.
Risk: Developers might disagree with the design about which specific error types should be recoverable.
Risk: Developers might think they don't need to customize terminal error UI any more. It may be confusing that there are multiple different error UIs to customize (one per error boundary, and a final terminal one).
Risk: If errors are recoverable, unwise use of this feature could lead to an infinite error loop. For example, an error boundary might immediately try to re-render the same child content after an error, which immediately (or asynchronously, after some async I/O) throws again.
Unknown: What happens if the error boundary itself has an unhandled error? See detailed design for a proposal. The risk here is that it's nonobvious.
Risk: If a large number of errors occur all at once within different error boundaries (e.g., if there's a separate boundary around each row in a grid, and every row throws), the log may be swamped with errors.
Risk: If we have to put in new
try
/catch
blocks within the low-level rendering internals (and we will), this might adversely affect perf when running on WebAssembly, perhaps especially in AoT mode, because (IIRC) exceptions are implemented by calling out to JavaScript which then calls back into .NET. The cost would likely be paid by everyone, whether or not they are using error boundaries. We'll have to measure this.Unknown: What are we putting in the default template? Should there be any error boundaries?
Risk: If there is an error boundary in the default template, or as soon as a developer adds one to their site, they are implicitly promising that any code they call from their lifecycle methods can tolerate unhandled exceptions. For example, any circuit-scoped model state no longer gets discarded at the first unhandled exception. What if it gets into a broken/illegal state and the circuit is still trying to use it? Developers need to understand what they are opting into. This is the same as with any state that spans multiple requests in MVC.
Risk: Naive developers might routinely mark all their components as being an error boundary and simply ignore exceptions. Again, this doesn't put framework internal stability at risk, but means the application will just not do anything sensible to notify about or log exceptions. Maybe we should put in some kind of roadblock to make error boundary components a special thing, not just a behavior you can trivially mix into any other component - not sure how though.
Risk: Whatever default error UI we bake in, developers might take a dependency on its exact appearance and behaviors (e.g., resizing to fit content or not) forever. Will we ever be able to change the default error UI? Is it enough to document that the error UI should be customized and if you don't, you accept that it might just change in new framework versions? See detailed design for proposal.
<div>
, with style/contents provided purely via CSS in the template, so there's no back-compatibility concern.Examples
Page-level error boundary in
MainLayout.razor
:Error boundary around a component that sometimes fails due to bad data: