Closed bturner closed 2 years ago
2021-05-25 11:11:15,409 WARN [threadpool:thread-19] jdoe *D51P0Nx671x3323004x13 1ygxqot 1.1.1.1,2.2.2.2 "GET /rest/api/latest/projects/KEY/repos/slug/branches HTTP/1.1" c.z.n.internal.BasePosixProcess Failed to start process
java.lang.RuntimeException: Interrupted while registering 7224
at com.zaxxer.nuprocess.linux.ProcessEpoll.registerProcess(ProcessEpoll.java:139)
at com.zaxxer.nuprocess.linux.ProcessEpoll.<init>(ProcessEpoll.java:74)
at com.zaxxer.nuprocess.linux.LinuxProcess.run(LinuxProcess.java:102)
at com.zaxxer.nuprocess.linux.LinProcessFactory.runProcess(LinProcessFactory.java:50)
at com.zaxxer.nuprocess.NuProcessBuilder.run(NuProcessBuilder.java:273)
at com.atlassian.bitbucket.internal.process.nu.NuNioProcessHelper.run(NuNioProcessHelper.java:75)
at com.atlassian.bitbucket.internal.process.NioCommand.call(NioCommand.java:46)
at com.atlassian.stash.internal.scm.git.command.TransformedGitCommand.call(TransformedGitCommand.java:45)
at com.atlassian.bitbucket.internal.branch.LatestCommitMetadataProvider.getMetadata(LatestCommitMetadataProvider.java:45)
at com.atlassian.stash.internal.repository.PluginRefMetadataMapProvider.lambda$null$3(PluginRefMetadataMapProvider.java:99)
at com.atlassian.stash.internal.concurrent.StateTransferringCallable.call(StateTransferringCallable.java:33)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:400)
at com.zaxxer.nuprocess.linux.ProcessEpoll.registerProcess(ProcessEpoll.java:116)
... 18 common frames omitted
Note that the line numbers for ProcessEpoll
above won't line up with any NuProcess release. We had to switch to running a forked build of NuProcess while waiting for pull request #118 and #120 to be merged. What is ProcessEpoll.java:116
in our fork is line 112 In the canonical NuProcess codebase.
$ ps -ef | grep defunct
atlbitb+ 7224 26859 0 May25 ? 00:00:00 [git] <defunct>
I'm happy to put together a pull request for this, but before I do I'm curious how you'd like to see it addressed, @brettwooldridge. I can think of a couple options:
ProcessEpoll.eventPool
completely and, instead, create an EpollEvent
for each LinuxProcess
. That way whenever ProcessEpoll
needs to register a process or queue for stdin
, it can use that process instance's EpollEvent
to do it without any need to block
EpollEvent
instances overall, but I resuscitated the JMH microbenchmark and ran it on the current 2.0.2 codebase and then on 2.0.2 + this change and the increase in the allocation rate was negligible; 63400 bytes/op and ~0.37 MB/sec before vs. 63500 bytes/op and 0.38 MB/sec after, with no measurable impact on ops/secProcessEpoll
code to use poll()
instead of take()
and offer()
instead of put()
(neither of which can throw InterruptedException
s). If poll
returns null
, ProcessEpoll
would instantiate an ad hoc EpollEvent
which it wouldn't put back on the queue
EpollEvent
s than if there was a fixed, reusable instance per process. (That said, that level of heavy load is uncommon, and likely to not be continuous.)Or maybe there's something else you prefer. I can put together whatever you think is the best option.
ProcessEpoll
maintains aBlockingQueue
ofEpollEvent
instances. When processes are registered, an instance is pulled off the queue, used to register events for that process, and then put back onto the queue. The queue has 64EpollEvent
instances in it, so In a system that's lightly loaded there's always an instance available. In a heavily loaded Bitbucket Server instance, though, spikes can easily trigger the creation of 100+ processes concurrently.If the thread is interrupted (e.g.
Thread.interrupt()
) while it's blocked intake()
, aRuntimeException
is thrown and caught byLinuxProcess
'sstart
orrun
code, which logs the exception with a message that the process couldn't be started and the process is abandoned. But that's not actually accurate. The process was started, but NuProcess couldn't register it. That means NuProcess never reaps the process and it either hangs (if it needsstdin
or blocks after writing too much tostdout
orstderr
) or terminates and becomes a zombie.