brettwooldridge / NuProcess

Low-overhead, non-blocking I/O, external Process implementation for Java
Apache License 2.0
710 stars 84 forks source link

macOS: Should use pthread_fchdir to restore the current working directory not pthread_chdir #144

Open skissane opened 1 year ago

skissane commented 1 year ago

In src/main/java/com/zaxxer/nuprocess/osx/OsxProcess.java I see this code:

      finally {
         rc = LibC.SYSCALL.syscall(SyscallLibrary.SYS___pthread_chdir, oldCwd);
         Native.free(Pointer.nativeValue(oldCwd));
         checkReturnCode(rc, "syscall(SYS__pthread_chdir) failed to restore current directory");
      }

That's not the correct way to use SYS___pthread_chdir. The problem is this: initially, a thread's per-thread working directory is NULL, which means it inherits the process-wide working directory, and it will instantly reflect any changes in the process-wide working directory. Once you call SYS___pthread_chdir, it now has a non-NULL per-thread working directory, which means it has been unlinked from the process-wide working directory. Calling SYS___pthread_chdir to restore it to the original CWD is not actually setting things back to how they were at the beginning, because you are leaving the thread's per-thread working directory non-NULL, and hence leaving the thread's working directory unlinked from that of the process.

The proper way to do this use another syscall SYS__pthread_fchdir, which changes the per-thread directory based on a file descriptor pointing to the directory not a string. It accepts the magic file descriptor value -1, which sets the per-thread working directory back to NULL, so it will inherit the process-wide working directory once more. Also, it is much more reliable than trying to SYS__pthread_chdir back to the original working directory – that can fail in various obscure circumstances, pthread_fchdir(-1) is vastly less likely to fail, and should still work in all those obscure circumstances.

If you want an example of code which does this the right way, see this file in the Google Chromium source code: https://chromium.googlesource.com/chromium/src/+/76ee79eb33a381f4fbaa023c1fc8ee8b4ddc46c8/base/process/launch_mac.cc#107

That's a bit of an old version. Newer versions of Chromium, Google has removed support for old macOS versions which lacked the pthread_chdir_np/pthread_fchdir_np wrappers, so they've removed the code which used the syscalls directly – actually, that raises another point – you really ought to use the pthread library functions not the syscalls. Direct use of the syscalls is more likely to be broken by Apple at some point. It is only required if you need to support macOS versions 10.5 thru 10.11 – versions prior to 10.5 lacked the syscalls, and 10.12 onward have the wrappers. Given 10.11 (El Capitan) was released over 7 years ago, and has been officially unsupported for over 5 years, maybe it would make sense to just switch to the wrappers.

Apple has never officially documented this stuff, the closest they've come is code included in some of their open source releases, for example https://opensource.apple.com/source/libpthread/libpthread-218.1.3/private/private.h.auto.html – the comment there does document how to properly clean up:

 * @discussion
 * This will set the per-thread current working directory to the provided path.
 * If this is used on a workqueue (dispatch) thread, it MUST be unset with
 * pthread_fchdir_np(-1) before returning.

Now, despite the current code being technically wrong, the odds of it causing problems in practice are rather low. Still, it wouldn't be hard to fix it to do it the right way. And the risk with having code doing it the wrong way in a popular open source project, is other people may learn that incorrect approach from it.

bturner commented 1 year ago

Thanks for the detailed report, @skissane. I'll see if I can work up a fix for this.

I'm not sure what @brettwooldridge's aspirations are in terms of the supported version range for macOS, so I'm likely to stick with a direct syscall approach for the moment. Switching to a pthread function seems like the more robust approach, certainly, but we'd probably want to pair that with a major version change to NuProcess because it would require dropping support for versions of macOS that are currently supported (even if a supported version range isn't currently documented; I don't like changing that sort of behavior on library consumers without a matching semver change to draw attention to it unless I have some sort of compelling reason to do so).