MithrilJS / mithril.js

A JavaScript Framework for Building Brilliant Applications
https://mithril.js.org
MIT License
14.02k stars 924 forks source link

Error handling and recovery #2273

Closed dead-claudia closed 2 months ago

dead-claudia commented 6 years ago

Mithril currently doesn't have any error handling story. This is of course really bad.

Here's the concerns I believe need addressed when it comes to fault tolerance, in decreasing priority:

The first two are adequately covered in #1937, but the other 4 are the focus of this bug: helping users write error-tolerant apps.


Edit: Update this for clarity, remove some special semantics.

My proposal is this: add a new error handling sequence algorithm to address all errors, with an associated optional hook onerror: function (vnode, error) { ... } usable on both components and vnodes.

The error handling sequence is as follows:

  1. Find the first parent with an onerror hook. If no such parent exists, use the root.
  2. Create a reference to the initial error. This is the current error.
  3. Clear the parent's children, calling onremove as appropriate, but not onbeforeremove. If any hook throws, the hook sets the current error to it and otherwise ignores the error.
  4. If the parent is the root, rethrow the current error.
  5. Invoke its onerror hook with its vnode and the current error.
  6. If the hook throws:
    1. Set the current error to the thrown error.
    2. Set the parent to its parent. If no such parent exists, set it to the root.
    3. Go to step 2.
In code, it would look something like this: ```js // Where `parentPath` is a linked list of error handlers for this vnode. function handleError(path, error) { while (path != null) { var lastError = {value: error} onremoveChildren(path.children, lastError) if (path.vnode == null) throw lastError.value if (typeof path.vnode.tag !== "string" && typeof path.vnode.state.onerror === "function") { try { callHook.call(path.vnode.state.onerror, vnode) return } catch (e) { lastError.value = e } } if (path.vnode.attrs && typeof path.vnode.attrs.onerror === "function") { try { callHook.call(path.vnode.attrs.onerror, vnode) return } catch (e) { lastError.value = e } } path = path.parent } } // `onremove` gets fixed to this function onremoveChildren(children, lastError) { for (var i = 0; i < children.length; i++) { if (children[i] != null) onremove(children[i]) } } function onremove(vnode, lastError) { if (vnode.attrs && typeof vnode.attrs.onremove === "function") { try { callHook.call(vnode.attrs.onremove, vnode) } catch (e) { lastError.value = e } } if (typeof vnode.tag !== "string") { if (typeof vnode.state.onremove === "function") { try { callHook.call(vnode.state.onremove, vnode) } catch (e) { lastError.value = e } } if (vnode.instance != null) onremove(vnode.instance) } else if (Array.isArray(vnode.children)) { onremoveChildren(vnode.children, lastError) } } ```

Here's when the error handling sequence is executed:

When it fires, it receives the current vnode (not the origin) and the thrown error. It may choose to do one of two things:


Here's a couple concrete example of this in action:

// Here's a component that logs errors and reinitializes its tree on error.
var RestartOnError = {
    onerror: function (v, e) { console.error(e) },
    view: function (v) { return v.attrs.view() },
}

// Here's how you'd use it:
m(RestartOnError, {view: function () {
    return m(ComponentThatLikesToError)
}})

// Here's a component that swaps a view on error:
function SwapViewOnError() {
    var error = false
    return {
        onerror: function (v, e) { error = true; console.error(e) },
        view: function (v) {
            if (!error) {
                try {
                    return v.attrs.view()
                } catch (e) {
                    console.error(e)
                    error = true
                }
            }
            return v.attrs.errorView()
        },
    }
}

// Here's how you'd use it:
m(SwapViewOnError, {
    view: function () { return m(ComponentThatLikesToError) },
    errorView: function () { return "Something bad happened..." },
})
barneycarroll commented 6 years ago

To clarify, you're proposing the following outcomes for onerror resolution:

  1. return true as a convenient shorthand equivalent to calling m.redraw()
  2. otherwise engage the removal process for the component vnode, substituting a null vnode in place

Correct?

I personally think this is a can of worms best kept shut. As far as the outcomes above are concerned:

  1. It's easy enough to call m.redraw() or explicitly handle the error scenario in whatever way necessary anyway. Besides, for the sake of avoiding corrupted vtrees (such that the next draw, if it comes, produces the expected results) - wouldn't the current vtree computation (assuming the error was thrown in oninit) have to introduce a null vnode in place anyway? (if the view threw, I would expect the vnode.instance to resolve as undefined)
  2. I don't think it necessarily follows that the removal process should engage: in the case of a component which failed to initialise, I'd assume it's more likely that triggering onremove would lead to further exceptions seeing as the initialisation requirements were never met. In the case of a component which has successfully initialised but whose view or onbeforeupdate iteration threw, wouldn't the logical assumption be that the last vnode.instance remain in place?

There's also a dodgy notion that while class and POJO components can conceivably handle their own internal exceptions with the onerror hook, closures can't always, since an exception in the closure execution means the onerror hook would never be received by Mithril. This invites such practices as wrapping dodgy trees in higher order exception handling components, or avoiding closures and preferring abject orientation in the expectation that your code is going to be badly written (ie not have its own internally logical error handling).

The solution to the above is the same as what you would do in any other situation: don't transform your Mithril application composition to address fundamental logical errors: write better code that addresses your potential for exceptions explicitly in vanilla Javascript.

Take a look at this component:

() => {
  let data
  let error
  let loading = true

  return {
    oninit: async () => {
      await delay(1000)

      try {
        data = await m.request('https://rem-rest-api.herokuapp.com/api/users', {
          withCredentials: true,
        })
      }
      catch(e){
        error = e
      }
      finally{
        loading = false
      }
    },

    view: () => (
        loading
      ?
        'Loading...'
      :
        data
      ?
        JSON.stringify(data)
      :
        error
      ?
        error
      :
        '🤷‍'
    ),

    oncreate: async ({dom}) => {
      try {
        dom.style.cssText = `
          opacity: 0;
          transition: 1s;
        `

        await frame()

        dom.style.opacity = 1
      }
      catch(e){
        console.log('No big deal, life carries on as normal: ', e)
      }
    },
  }
}

This component has 2 identified potential exception sites: one in initialising its data and one in DOM mutation. For the first, we need to communicate this back to the user - this is caught, informs the internal model, and is reflected in the view; for the second, we determine that since this isn't really a critical issue, it's fine to fail silently without any compensatory logic at all.

This is by far superior to the opinionated vtree error handling proposal:

  1. We're using native Javascript features which can be learned, finessed and tweaked in all manner of ways without Mithril-specific expertise.
  2. The errors happen to take place in async code blocks: would Mithril even be able to catch them? Would it be expected? If the specifics of the oncreate function were redesigned to take place synchronously, would that change, and if so would that be desirable to the author?
  3. Handling errors specifically near the site of their origin allows us to trace the semantics legibly and deal with them in local scope - essential in the case of the request, but also generally preferable since we no longer need to (just think on this for a couple of seconds) sniff the error received in onerror and try to infer the nature of the problem and respond adequately.
  4. We don't invite the leading generic responses of redraw or remove, neither of which are appropriate - and we get to recover internally. Moreover, there's nothing in the current proposal that can't be handled this way.

1 & 3 are the real kickers: Mithril has long prided itself on being a Javascript-first library that embraces the language and empowers its users with subtlety & flexibility. The whole idea of opinionated error heuristics and automated recovery procedures introduces more complication to the code base and patronises the user into clumsiness and confusion, where ignorance and bad design are mitigated with further compensatory bad design. We shouldn't seek to compete in the domain of "programming is hard, let's go shopping" - the market is already saturated by people who have since made shopping hard. As the old saying goes, "now you've got two problems": what would normally be identified as a code typo, service failure or application design problem is now obscured into a special class of 'Mithril problems', for which the library claims to take some responsibility but can never truly give as satisfactory an outcome as could be devised without. Instead we should encourage users to think through the nature of their problems, ask for help, and come to superior solutions which deal more concretely with the specifics of their concerns.

dead-claudia commented 6 years ago

@barneycarroll

To clarify, you're proposing the following outcomes for onerror resolution:

  1. return true as a convenient shorthand equivalent to calling m.redraw()
  2. otherwise engage the removal process for the component vnode, substituting a null vnode in place

I decided to drop 1, since it appears unnecessary anyways. The return true is first and foremost a directive to Mithril to consider it handled and not propagate it further. It's still opinionated in the direction of propagation and further removal, since returning undefined doesn't prevent it, .

I added 2 not because of opinions, but because if a component is truly in an invalid state, it's unsafe and potentially destructive to leave it up and running. It's the same reason people use Forever and just let Node processes crash and restart in backend servers, rather than trying to recover using "uncaughtException". And I'm in agreement with the React team's rationale they included in their docs:

We debated this decision, but in our experience it is worse to leave corrupted UI in place than to completely remove it. For example, in a product like Messenger leaving the broken UI visible could lead to somebody sending a message to the wrong person. Similarly, it is worse for a payments app to display a wrong amount than to render nothing.

JAForbes commented 6 years ago

One of my favourite things about mithril is when my code crashes, the stack trace is very short and mostly my code. I can step through it, and there's no magical system trying to automatically recover.

I think when talking about error handling it's beneficial to divide errors into two classes: Expected | Unexpected.

Expected errors, are errors that are part of the contract of a system. An expected error may be JSON.parse throwing when you give it a string. String's aren't necessarily valid JSON, it's expected that JSON.parse can throw. And as Barney said, those kind of expected errors are best handled at the source. Whether it's at parse time, request time, call time etc. It's not mithril's concern. It's the application developer's concern. And if mithril crashes as a result of that developer's error, it's actually helpful because they will know where the problem is and how to resolve it.

Unexpected errors, are errors that occur due to programmer error, whether in a library or an application. It's where behavior is counter to what is expected and/or documented. When such errors occur, they are bugs and should be fixed in the appropriate place.

There's no reason a mithril view should ever throw, if it does, it's an unexpected error. As opposed to exposing an error handling hook on the component, we should let it crash so we can identify the bug. And then fix the bug in the form of a release.

Building it into the mithril programming model, separate to native JS, doesn't make much sense to me.

I think this is another situation where other frameworks (in the pursuit of DX) are actually making things worse by abstracting too much and blurring abstraction boundaries and responsibilities. Mithril has a job, it does it well. Error handling is best handled by other parts of the stack (e.g. sum types).

dead-claudia commented 6 years ago

@JAForbes

One of my favourite things about mithril is when my code crashes, the stack trace is very short and mostly my code. I can step through it, and there's no magical system trying to automatically recover.

This would have no effect on stack traces, even on propagated errors.

I think when talking about error handling it's beneficial to divide errors into two classes: Expected | Unexpected.

Expected errors, are errors that are part of the contract of a system. An expected error may be JSON.parse throwing when you give it a string. String's aren't necessarily valid JSON, it's expected that JSON.parse can throw. And as Barney said, those kind of expected errors are best handled at the source. Whether it's at parse time, request time, call time etc. It's not mithril's concern. It's the application developer's concern. And if mithril crashes as a result of that developer's error, it's actually helpful because they will know where the problem is and how to resolve it.

I do agree that it's not a good idea to use this for expected errors. This proposal is about handling unexpected errors, and I would specifically note that it's not a replacement for try/catch. (In fact, I'd list that as an anti-pattern.) It shares more in common conceptually with the Zone proposal, just with less initialization boilerplate.

Unexpected errors, are errors that occur due to programmer error, whether in a library or an application. It's where behavior is counter to what is expected and/or documented. When such errors occur, they are bugs and should be fixed in the appropriate place.

There's no reason a mithril view should ever throw, if it does, it's an unexpected error. As opposed to exposing an error handling hook on the component, we should let it crash so we can identify the bug. And then fix the bug in the form of a release.

Keep in mind that if left uncaught, the browser itself eventually swallows it and logs it so the page doesn't crash - this is required by spec. The difference is who swallows it: is it the browser, or is it a particular component that tolerates and potentially expects unexpected errors in children? Allowing a component to handle it allows things like:

Building it into the mithril programming model, separate to native JS, doesn't make much sense to me.

Native JS doesn't have anything here for logically linking and collecting async error handling at all, not even for native promises. Node has domains, but those require complicated native embedder hooks. There was a proposal to add the language hooks we'd need, but that's basically stalled on Node still trying to nail down the correct VM primitives they need (which they themselves need formalized almost spec-level). If native JS had anything useful at all, I'd be proposing a means for Mithril to hook into that instead. 😉

I think this is another situation where other frameworks (in the pursuit of DX) are actually making things worse by abstracting too much and blurring abstraction boundaries and responsibilities. Mithril has a job, it does it well. Error handling is best handled by other parts of the stack (e.g. sum types).

The main drive for this isn't DX, it's two-fold and mostly about end user experience:

  1. Allowing people to hook in to not always let production errors crash apps. (It's actually harder to write fault-tolerant apps in general.)
  2. Amplifying the presence of existing bugs, so they are easier to spot and are more likely to get fixed.

Also, this is almost entirely opt-in. The only breaking change here is that errors would clear the tree (calling onremove on siblings and their descendants in the process), but some virtual DOM-based frameworks already do this (like React and friends), and React found it broke very few people and almost all breakage just made existing bugs far more obvious. Vue instead only does error propagation, which is slightly simpler and less magical, but I'm not a fan of leaving invalid state hanging around, and Vue already has undesirable behavior (swallowing errors and just logging them unconditionally).

pdfernhout commented 5 years ago

@isiahmeadows To echo points made by @barneycarroll and @JAForbes, one of Mithril's major design features is relying on JavaScript/HTML/CSS as much as possible -- a design choice which makes it possible to preserve a lot of underlying simplicity in the Mithril codebase and developer experience (e.g. no need for a template language). And ideally we all want to keep it that way, which is why there is some pushback on this proposal out of fears (perhaps unfounded) that Mithril will drift into excessive complexity via adding fancy error handling.

Many other UI tools (Angular, React, etc.) go down undesirable paths of complexity (Angular's reliance on Zones, Observables, and an ad-hoc templating language) and premature optimization (e.g. React's reliance on setState) for questionable benefits and ultimately poor developer ergonomics -- which is why I prefer Mithril. As a benefit, Mithril 1.0's core code is roughly only 1000 lines (or now approaching 2000 lines for 2.0) which can (in theory) be understood by a motivated individual programmer in a reasonable amount of time -- another reason I prefer Mithril. Using Mithril has made me a better JavaScript/HTML/CSS developer precisely because it leverages all three so much. By contrast, when I used Dojo/Dijit UI components they tried to hide all that complexity -- but those abstractions were still leaky and so I had to eventually learn all the JavaScript/HTML/CSS basics plus debug complicated leaky abstractions in Dojo/Dijit when something went wrong. Mithril is a huge breath of fresh air compared to any of those three (Angular, React, or Dojo/Dijit).

I've used Angular with Zones. In practice, Angular Zones made debugging difficult because of long stack traces of unrelated code when anything goes wrong. I'd be concerned about adding Zone-ish complexity to Mithril. Mithril's simplifying assumption that the data model is dirty on any UI event or on a request (and you can handle timeouts and some other things yourself) is a great design simplification. While the use of an error handler is "opt in" what might not be opt in is dealing with a lot of extra code complexity in Mithril when debugging other errors if it goes down the Zones route. It's not just the amount of extra code as much as the way the extra Zones code adds complexity to every single user event interaction, timeout, or request.

I've also user React's Error Boundaries. They are not that bad as a concept as a component you put above children that may fail -- but they still ultimately do not solve major issues such as knowing what to do with many errors you may want to display to the user. They also do nothing for unhandled promise rejections (which requires adding an onunhandledrejection callback anyway). And errors in the Error Boundary component itself are still not handled. And of course none of this protects against poorly-written error handlers in underlying code that silently swallow errors or promise failures.

So, there is no magic bullet for dealing with errors. People will still need to test a lot and debug things. Ideally buggy code should fail as quickly as possible ("fail fast") and as near as possible to the originating code to make testing and debugging easier. When that is not possible there should ideally be a way to quickly trace back to the offending bit of code that configured the code that failed (which usually means preserving an extra reference somewhere between "compiled" object and "source" definition).

I don't disagree with the React design quote you referenced on the value of not showing UIs in an error state -- but I am wondering about what tradeoffs are worth it in practice for Mithril at the cost of increased library code complexity for what specific developer or tester or user benefit? As a developer (and in practice tester of code I write) I am OK with error reporting being done in the console and not done via some limping-along UI parent component of the child with the issue -- if that means the core library code I use is easier to understand and more reliable and has had more attention paid instead to helping with the most common error cases.

For example, yesterday, with Mithril 2.0, I had to play guessing games with errors caused by my passing in null or undefined for various things (including for a key). The failures were reported via long Mithril stack traces that left me inspecting lots of code looking for possible causes of a null or undefined somewhere. In practice, I feel reporting more of those sorts of simple errors earlier (like when the "m" call is made) would provide more bang-for-the-buck than more complex error handlers for downstream issues. Such extra checks might involve a cost to performance -- and if so, perhaps they might only be done when setting a global flag that could be turned off in production?

All that said, I don't understand all of what is being proposed here well enough to know how these general design comments really apply to this specific proposal. I don't know the scale of the changes proposed to the codebase -- even if any reference to Zones worries me from Angular debugging experience. Also, without some sort of metrics on types of Mithril error issues and their frequency and severity in practice, we only have anecdotal comments as mine above on what areas of error handling should be a priority. So, still a bunch of unknowns. Maybe what you propose is indeed a great way forward overall despite some complexity concerns?

What I do know is that Mithril 2.0 already works well right now overall, even with some issues and edge cases as above, and thanks for that! And I feel it is OK for Mithril to have edge cases it does not handle well if they are documented (e.g. stay away from this cliff) -- and it is clear they are the result of design tradeoffs consciously made to preserve simplicity. Mithril 1.0 had such limitations and we got a lot of great stuff done with it despite -- or perhaps because of -- those limitations. Every library will have limitations. Whether those limitations are "bad" depends in part on managing expectations.

It is fantastic to be sharp and able to juggle a lot of complexity as a programmer (and I am less sharp than I was so many years ago, enjoy it while it lasts -- good nutrition, good sleep, exercise, stress management, and so on can make a difference in extending that time). But it is sometimes much harder to find an appropriate simplicity to get something done well that does not require as much deep insight to use or understand as it did to invent. Such solutions may transcend the "three tribes of programmers" by being all of elegant, performant, and practical -- as Mithril 1.0 was compared to AngularJS reflecting Leo's insights and experiences (and maybe a bit of Elm's reactive/functional ideals as ideas bounce around in the Noosphere?).

Before proceeding on this proposal, I'd suggest (re?)watching this insightful Rich Hickey presentation on "Simple Made Easy" and meditating on the challenge of how the insightful design advice there might apply to improving Mithril error handling or other features: https://www.infoq.com/presentations/Simple-Made-Easy "Rich Hickey emphasizes simplicity’s virtues over easiness’, showing that while many choose easiness they may end up with complexity, and the better way is to choose easiness along the simplicity path. ... We should aim for simplicity because simplicity is a prerequisite for reliability. Simple is often erroneously mistaken for easy. "Easy" means "to be at hand", "to be approachable". "Simple" is the opposite of "complex" which means "being intertwined", "being tied together". Simple != easy. ... The benefits of simplicity are: ease of understanding, ease of change, ease of debugging, flexibility. ..."

pdfernhout commented 5 years ago

@isiahmeadows On a tangent from my previous comment, if you are going to do a deep dive into making Mithril more fault-tolerant -- whether via worthwhile tradeoffs or otherwise by insightful innovation that avoids tradeoffs -- here is a related document: https://resources.sei.cmu.edu/library/asset-view.cfm?assetID=11747 "A Conceptual Framework for System Fault Tolerance... February 1992 • Technical Report Walter Heimerdinger, Charles B. Weinstock ... This document provides vocabulary, discusses system failure, describes mechanisms for making systems fault tolerant, and provides rules for developing fault-tolerant systems."

A key point there is the distinction between "faults" and "failures" -- where essentially a failure occurs when an unhandled fault propagates across a subsystem boundary causing the subsystem to not comply with its specifications (e.g. for JavaScript, typically when an exception is thrown that is not handled in the originating component -- although there are many other sorts of possible failures like displaying incorrect results). Since systems usually have nested components, sometimes a failure of an internal component might be handled as a fault by the enclosing component and not propagated further. The paper also discusses aspects of fault tolerance like: Fault Avoidance, Fault Removal, Fault Detection, Fault Diagnosis, Fault Containment, Fault Masking, Fault Compensation, and Fault Repair and how they can happen at different points in the system lifecycle from design to development to use.

It might be interesting to think about the combination of an application and the existing Mithril library and consider how different parts of the code or development process relate to those different aspects of fault tolerance. Then we could consider how this new proposal fits into that framework.

ghost commented 4 years ago

Hi team, sorry if this is semi-private discussion, but I have a couple of thoughts on it.

First, in "iterative" projects, as I call them, we usually simply deploy hand-tested prototypes to our company's business departments. It's hard to test them thoroughly (without data / real services which may be sensitive or otherwise unavailable), but it's also hard to deliver them in a timely manner if you're after ticking all the "industry-grade" boxes. Our production is a battlefield by its nature, not an usual build-test-test-test-deploy-happy cycle. I believe that many fast-growing or fast-niched companies work like that.

Since projects may (and will) throw unexpected errors randomly in such semi-production, I have to catch and report all errors from a presentation layer, and the most plausible way to do that seems to be to wrap at least a component.view() methods into something that will catch and return m(ErrorComp, {error}) instead. This way, errors in oncreate will be catched above, and that's okay. (Autoreporting to backend with devteam messaging would also be nice, but that is out of a m() scope.)

We also wrote a m wrapper which parses {class:['foo', {bar:true}]} and allows functions in children, so wrapping onxyz events should not be a hassle too. But.

I think that while "don't be too cool and rely on JS" is a good idea in general, "developers should fix them bugs" actually may turn into an impractical restriction (policy), not a mechanism. If the proposed onerror handler existed, I would no doubt use it right now. If unexpected errors just broke everything, it would be okay, but in our tests everything happens -- from just nothing to doubled renders (two-three items spawning on a faulty click). This will make a next-door user think that they didn't understand something and allow to break their data further instead of emergency reload or letting the devteam know. << This concerns me most, as there is no clear indicator if something went wrong -- users cannot see nor interpret the console.

I could just wrap every method in a try-catch, but that would explode LoCs and make source code feel as if you went to the store in a power armor just in case raiders visit the city. It is not your daily JS and I doubt that that aligns with an original simplicity intent. Mithril doesn't burden us with abstractions, and that's good, but its view hierarchy rendering is its responsibility, its run-loop and an its intrinsic lifetime property. If e.g. Promise didn't allow to catch(), it would be the same exact thing, imho -- everyone had to write:

promise.then(result => {
  if (!result.ok) return result
  try {
    return {ok:true, value:foo(result.value)}
  }
  catch (error) {
    report(error)
    return {ok:false, error}
  }
}).then(...)

instead of clean:

promise
  .then(foo)
  .then(...)
  .catch(report)

... or hope for the best.

pygy commented 1 year ago

@dead-claudia why would we put the try/catch at the level of the component that defined the error handler rather than the one where the error occurred? Doing the latter requires little more code (using a stack-managed global to track the current error handler) and would also work with islands (a reentrant m.render() call could use the current handler and tie it to the nested root along with the last vdom).

Likewise, m.render could accept an error handler as a default for all components (and tie it to the root for redraws).

dead-claudia commented 1 year ago

@pygy I'll answer that a little out of order.

[...] (a reentrant m.render() call could use the current handler and tie it to the nested root along with the last vdom).

Nested m.render calls might not want that forwarding, so they should be allowed to catch errors that propagate out. The easiest way to do this is by just propagating the uncaught exception.

why would we put the try/catch at the level of the component that defined the error handler rather than the one where the error occurred?

There's a few reasons for this:

  1. It's much easier to capture all the user errors in one location than everywhere external errors could occur (ranging from lifecycle hooks to attribute serialization to DOM property assignment exceptions to even custom element insertion). Sure, state would be roughly identical, but legitimate errors external to Mithril can happen almost anywhere.
  2. Leaving nodes up when exceptions propagate outside their scope is a bad idea. React did actually do the right thing here, removing the subtree when an error propagates past them.
  3. By centralizing error capture, it'd be easy to make sure it catches internal renderer errors as well. In an ideal world, we aren't throwing exceptions of our own, but it does help us (and our users) tolerate that class of Mithril bugs better.