Open benjaminp opened 3 months ago
cc @wilwell @zhengwei143
It looks like that one thread is waiting for the idle worker in borrowObject and meanwhile another can't return the worker in synchronized release .
By code logic it should not happen, because we should check that we have enough idle workers before grabbing the new one.
I saw that recenlty @zhengwei143 rewrote this code with new method hasAvailableQuota
in cl/627316722 (commit 5742e69c6f8d5705c645b94ca8fe745370ef6c92). By the first view it doesn't seem incorrect with race conditions, but the code in ResourceManager and Workers is tricky. Could it be that this change caused the deadlock?
@benjaminp how often does it reproduce?
This deadlock could be resolved with async return of the worker to the pool, but probably there is an error in the code logic, which triggers this deadlock
Probably we need to add the assert if we are trying to borrow new worker, but there is no free one.
One thing I noticed is that the build that first the this deadlock printed the UI message about starting a new Javac
multiplex worker. So, I'm wondering if the problem is all in the worker creation code being stuck when a worker is (re)created.
@benjaminp could you please attach info log if you catch this behaviour again?
I suspect that functionality could have bugs in with multiplex workers
I was running a build with 463f80979127b39bb41c9afde5b5863914a7609d, and it hung. The stacktraces reveal what looks like a hang checking out a worker: