Hi there! I encountered a crashing bug while using xpc-connect, and while trying to fix it I discovered a few other crashing bugs and memory leaks. They're mostly all related, so I went ahead and fixed all the bugs I found in this one PR. I tried my best to not add any breaking changes, although there are two things that might loosly be considered breaking, which I describe below.
The original problem I encountered was a race condition where sometimes xpc-connect would crash when calling shutdown at the same time a message was being sent from the remote. In my particular case, the client would send a message to the service which would then immediately close the connection and result in a "connection interrupted" message sent to the client, and then the client would shutdown at the same time. Sometimes the shutdown call would happen in the middle of the "connection interrupted" event being asynchronously queued, while eventQueueMutex or asyncHandle were being used, which would result in segmentation faults or aborts when they were used after destroy.
Overall, here are the bugs I fixed / changes I made. Sorry it's so many in one PR, but most of them all had to happen together / were fixed with the same changes.
Crashing bug: Fix the original bug above by only destroying everything after the connection is fully closed. So all objects are now destroyed after the XPC_ERROR_CONNECTION_INVALID is received and handled (this is important, to prevent this or eventQueueMutex from becoming destroyed too early). "connection invalid" is always the last message sent on an XPC connection when it is closed (cancelled), so it is guaranteed to be handled.
Crashing bug: Calling setup twice, or shutdown before/without setup, would crash the program. Fixed with checks and throwing a JS error if setup is called twice. I made shutdown silently fail, since there's not really any harm in letting it be called multiple times, and I imagine it could be convenient in some situations.
Resource leak: Use xpc_connection_cancel instead of xpc_connection_suspend. _suspend pauses a connection, but _cancel actually closes it and results in the aforementioned "connection invalid" error.
Memory leak: Call xpc_release on the XPC connection after it closes to free it.
Memory leak: Call dispatch_release on the dispatch queue after the connection is freed.
Memory leak: The XpcConnect class instance was never destroyed because it held a persistent reference to itself through the This variable. As far as I could tell, there is no point to that. So I removed This and replaced its usage Nan::New<Object>(this->This) with this->handle() which, as far as I can tell, is identical.
Crashing bug: This bug was actually introduced by allowing XpcConnect instances to be deallocated in [6] above -- XpcConnect could be deallocated while the async XPC event handler is running, resulting in a segmentation fault with using this. So I used this->Ref() on setup and this->Unref() on connection invalid to prevent the class instance from being deallocated while in use.
Crashing bug: Also introduced by allowing XpcConnect instances to be deallocated -- it seems there is a bug with Nan::AsyncResource where it cannot be freed in an ObjectWrap's destructor. I couldn't figure out why, so I just made it construct in setup and destruct when destroying everything else after "connection invalid". This is technically a breaking change, because it means the async_hooksinit and destroy events will happen at slightly different times than before. There's not really anything that can be done about that, though. And I doubt anyone is actually using async_hooks with this anyway.
Optional: Throw error if sendMessage is called before setup. This is kind of a breaking change, you can remove it if you want and just make it be silent like before. I didn't also throw an error for using it after shutdown because xpc_connection_send_message intentionally never acknowledges errors, and with the asynchronous nature of connections possibly being closed at any time, it made sense to me to just leave it failing silently.
Typo: xpcConnnection with three ns.
I also added tests to index.spec.js to test some of the fixes. I don't know how to test the memory leaks though, and the original crash I encountered was pretty specific to my custom XPC service, so I couldn't really write tests for that. But I did test a bunch with my own service and haven't been able to cause any crashes now.
Thanks for reading this essay of a PR description haha. Let me know if there's anything you want me to change.
Hi there! I encountered a crashing bug while using xpc-connect, and while trying to fix it I discovered a few other crashing bugs and memory leaks. They're mostly all related, so I went ahead and fixed all the bugs I found in this one PR. I tried my best to not add any breaking changes, although there are two things that might loosly be considered breaking, which I describe below.
The original problem I encountered was a race condition where sometimes xpc-connect would crash when calling
shutdown
at the same time a message was being sent from the remote. In my particular case, the client would send a message to the service which would then immediately close the connection and result in a "connection interrupted" message sent to the client, and then the client would shutdown at the same time. Sometimes the shutdown call would happen in the middle of the "connection interrupted" event being asynchronously queued, whileeventQueueMutex
orasyncHandle
were being used, which would result in segmentation faults or aborts when they were used after destroy.Overall, here are the bugs I fixed / changes I made. Sorry it's so many in one PR, but most of them all had to happen together / were fixed with the same changes.
XPC_ERROR_CONNECTION_INVALID
is received and handled (this is important, to preventthis
oreventQueueMutex
from becoming destroyed too early). "connection invalid" is always the last message sent on an XPC connection when it is closed (cancelled), so it is guaranteed to be handled.setup
twice, orshutdown
before/withoutsetup
, would crash the program. Fixed with checks and throwing a JS error ifsetup
is called twice. I madeshutdown
silently fail, since there's not really any harm in letting it be called multiple times, and I imagine it could be convenient in some situations.xpc_connection_cancel
instead ofxpc_connection_suspend
._suspend
pauses a connection, but_cancel
actually closes it and results in the aforementioned "connection invalid" error.xpc_release
on the XPC connection after it closes to free it.dispatch_release
on the dispatch queue after the connection is freed.XpcConnect
class instance was never destroyed because it held a persistent reference to itself through theThis
variable. As far as I could tell, there is no point to that. So I removedThis
and replaced its usageNan::New<Object>(this->This)
withthis->handle()
which, as far as I can tell, is identical.XpcConnect
instances to be deallocated in [6] above --XpcConnect
could be deallocated while the async XPC event handler is running, resulting in a segmentation fault with usingthis
. So I usedthis->Ref()
on setup andthis->Unref()
on connection invalid to prevent the class instance from being deallocated while in use.XpcConnect
instances to be deallocated -- it seems there is a bug withNan::AsyncResource
where it cannot be freed in anObjectWrap
's destructor. I couldn't figure out why, so I just made it construct insetup
and destruct when destroying everything else after "connection invalid". This is technically a breaking change, because it means theasync_hooks
init
anddestroy
events will happen at slightly different times than before. There's not really anything that can be done about that, though. And I doubt anyone is actually usingasync_hooks
with this anyway.sendMessage
is called beforesetup
. This is kind of a breaking change, you can remove it if you want and just make it be silent like before. I didn't also throw an error for using it after shutdown becausexpc_connection_send_message
intentionally never acknowledges errors, and with the asynchronous nature of connections possibly being closed at any time, it made sense to me to just leave it failing silently.xpcConnnection
with threen
s.I also added tests to
index.spec.js
to test some of the fixes. I don't know how to test the memory leaks though, and the original crash I encountered was pretty specific to my custom XPC service, so I couldn't really write tests for that. But I did test a bunch with my own service and haven't been able to cause any crashes now.Thanks for reading this essay of a PR description haha. Let me know if there's anything you want me to change.