Axosoft / nsfw

A super fast and scaleable file watcher that provides a consistent interface on Linux, OSX and Windows
MIT License
901 stars 113 forks source link

“libc++abi: terminating due to uncaught exception of type Napi::Error” in terminating Electron environment #184

Closed savetheclocktower closed 1 week ago

savetheclocktower commented 1 week ago

In Pulsar we've got a situation where we're trying to update to a modern version of Electron that enforces context-awareness and mandates reuse of renderer processes. So for the first time we're assessing our dependencies that use native modules for context safety and upgrading them where necessary.

Updating nsfw from 2.2.2 to 2.2.5 fixed almost all of our crashes that were attributable to nsfw, but we ran into a situation where we encountered a cryptic error about half the time when performing a page reload in the renderer:

libc++abi: terminating due to uncaught exception of type Napi::Error

I did my best to trace the root cause, but could only get so far: it appeared to get all the way to this line before throwing an error. So there appears to be an issue with calling start on a worker while a page is unloading.

You've probably got a few questions by now — for instance, “why would you start a new watcher while the page is unloading?” In our case, it was some overzealous code that tried to consolidate multiple watchers.

For instance, when there are requests to watch both path A and path B, we might react by watching those two paths’ nearest common ancestor and ignoring any changes that don't descend from either A or B. If you were to dispose of the watcher at path A, it'd stop the broader watcher and automatically start a new, more specific watcher for path B. And when you dispose of the watcher manager (as you'd do if you were unloading the page), it disposes all of the individual watch paths… and, well, that's the bug.

Anyway, that's all sorted out; fixing that bug prevented that specific renderer process crash. I'm mentioning this because I came across some information on the NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS directive and am wondering if it's a good fit for nsfw in this exact scenario:

By default, throwing an exception on a terminating environment (eg. worker threads) will cause a fatal exception, terminating the Node process. This is to provide feedback to the user of the runtime error, as it is impossible to pass the error to JavaScript when the environment is terminating. The NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS preprocessor directive can be defined to bypass this behavior, such that the Node process will not terminate.

Setting this constant and rebuilding nsfw also prevented the renderer process crash, even without the other bugfix.

This would be helpful for us, since we wrap an API around nsfw and expose it to third-party packages. They can probably figure out ways to do silly things, and though I think that should be discouraged somehow, a full crash of the user's renderer process seems a drastic enforcement mechanism. I don't know if there's any downside for nsfw, though, because I don't have any insight as to exactly what goes wrong when you try to start a watcher inside a terminating environment.

I'm mentioning it in case you think it's a good idea; if not, no worries, and we'll decide on our own whether to maintain the lightest of all possible forks for our own purposes.

Thanks!

savetheclocktower commented 1 week ago

I dumped a lot of text into that description, but somehow failed to mention this:

Another way around this exception would be if nsfw were able to detect that it was within a terminating environment and silently skip whatever action caused the exception. The quoted passage suggests that the exception isn't just a side effect of the terminating environment; it's something real that could be avoided.

Even when I was able to pinpoint exactly where the exception came from, I wasn't able to glean any information about the exception itself. I've been able to replicate this failure very, very infrequently outside of Electron using worker threads, so I'll see if I can throw together a pure-Node test case.

julianmesa-gitkraken commented 1 week ago

@savetheclocktower can you try to add:

if (mFinalizing) {
  return;
}

at the beginning of:

And test if the problem is fixed?

savetheclocktower commented 1 week ago

@julianmesa-gitkraken Afraid not — no difference. Strangely, it doesn't even get to the finalizer before things go wrong. The finalizer only seems to execute if the worker is allowed to exit gracefully rather than being terminated.

The crucial detection point would be in NSFW::StartWorker::OnOK() and its siblings. The StartWorker was created before the environment was terminated, but doesn't resolve until afterward.

Here's roughly what I've done to be able to reproduce the issue reliably in a non-Electron setting:

This is an exaggeration, but it was the only way I could get it to happen every time in an ordinary script. In an Electron environment it happens maybe half the time.

julianmesa-gitkraken commented 1 week ago

Then NSFW cannot do anything in code. Then you will have to test using NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS, add

"defines": [
    "NODE_API_SWALLOW_UNTHROWABLE_EXCEPTIONS"
],

in binding.gyp after "target_name": "nsfw", If works then you can make a PR

savetheclocktower commented 1 week ago

Yeah, that definitely worked when I tested it in Electron. Happy to make that PR if nsfw thinks it's the right default behavior.

ianhattendorf commented 1 week ago

Thank you for all of the research, I agree it makes sense to make unthrowable exceptions non-fatal.