Open natario1 opened 7 years ago
@grantland Are you still managing this project? Or can you point us to the right people who are?
I think I'm the only somewhat active maintainer. This repo hasn't seen much activity since it's been pretty stable and most of the TPL spec has already been implemented. I also seem to have missed a few PRs (probably happened while my current team at FB was on lockdown for a project), which I just addressed. If there happens to be more activity here I wouldn't mind having some help 😛
This was a while ago, but I think this specific ThreadPool
functionality was added because the Parse-Android SDK at the time (before Bolts-Android) used the default Executors#newCachedThreadPool()
and we were running into a ton of deadlocks. While I was trying to hunt them down, I noticed we were spinning up an enormous amount of threads and traced that to the fact that Executors#newCachedThreadPool()
is unbounded. This is fine for the traditional Java usage on servers with virtually unbounded CPUs, memory, and power, but it could negatively affect Android devices as they're much more limited by CPU power, memory, and battery life, in addition to the fact that lower end Android devices supported by our minSdkVersion
don't have many CPU cores.
Once I limited the ThreadPool
to a manageable size, I was able to untangle a bunch of non-executor related deadlocks (mainly Object#lock()
, synchronized
, etc) and things were running smoothly. Any deadlocks that we encountered after this we were able to either solve by untangling some bad code or de-nest our Task
s (as ImmediateExecutor
would bump deeply nested Task
s to Task.BACKGROUND_EXECUTOR
as you mentioned).
Your solution would solve the problem you're facing, but I'm a little hesitant to increase the max pool size of Task.BACKGROUND_EXECUTOR
/Task.callInBackground(callable)
to 32 or 64 as we'd potentially be chocking the CPU in a lot of applications using those methods.
Additionally, I think we can split up your issue into two parts to find a solution:
Task.BACKGROUND_EXECUTOR
directlyTask.BACKGROUND_EXECUTOR
indirectly via ImmediateExecutor
and Task#call(callable)
In my experience, 1 rarely happens and when it does it either uncovers some badly written code in which cleaning it fixes the deadlock or that code requires a specific non-Task.BACKGROUND_EXECUTOR
executor.
I see 2 being a much more important issue as nested Task
s can be randomly bumped to a different thread and we are unable to predetermine how deep in the stack our Task
will be executed.
With this, I think the least invasive solution would be to add another Executor
that ImmediateExecutor
bumps to. This Executor
can either be bounded or unbounded and I don't think it really matters as it's not called directly. This way, we don't choke the CPU for calls that use Task.BACKGROUND_EXECUTOR
, but we also don't cause impossible-to-trace deadlocks due to automatic bumping to a different thread when the call stack gets too deep.
Do you think this solution would be good enough to solve the issue? Do you think there is anything I'm missing?
Thanks @grantland ! So, I think we can agree on the following, correct me if I am wrong:
BACKGROUND_EXECUTOR
(e.g. to 128 like async task), or edit the comments and note that max pool size is unused. It won’t have any effect, but I don’t think you wanted it to be unbounded if I got this correctly.BACKGROUND_EXECUTOR
just to have a meaningful name, that would really help in debugging for examplesolve the ImmediateExecutor
issue as you said. If we add the thread factory, we could as well solve it this way, don’t know if it’s better:
// BoltsExecutors
private static boolean isBackgroundThread(Thread thread) {
return thread.getName().startsWith(“bolts-background-thread-”);
}
// ImmediateExecutor
@Override
public void execute(Runnable command) {
int depth = incrementDepth();
try {
// Don’t request a new background thread if this is already a background thread
if (depth <= MAX_DEPTH || isBackgroundThread(Thread.getCurrentThread())) {
command.run();
} else {
BoltsExecutors.background().execute(command);
}
} finally {
decrementDepth();
}
}
This is good news, but I think the parse SDK is still stuck at your first point, deadlocks occurring when using Task.BACKGROUND_EXECUTOR directly. You know that in the middle of each request Parse ends up either in ParseSQLiteDatabase.dbExecutor
or in ParseRequest.NETWORK_EXECUTOR
or in both. And then it explicitly jumps out of them. This means that a single execution needs 2 background threads, one waiting for the other. If the current pool has 5, it takes 3 concurrent executions and the SDK is locked.
What can we do?
Another solution off the top of my head would be to leave everything as is but have a small queue (< core pool). Then some commands might be rejected because max pool size is small as well. Instead of canceling the task, we could schedule it for later using RejectedExecutionHandler
. This ensures infinite capacity without stressing resources. But yeah, it looks bad.
@Override
public void rejectedExecution(final Runnable runnable, final ThreadPoolExecutor e) {
Task.delay(1000).then(new Continuation() {
e.execute(runnable);
});
}
@grantland any comments?
Based on what you said my plan would be
BACKGROUND_EXECUTOR
(e.g. to 128 like async task), or edit the comments, either one is wrongImmediateExecutor
issue as you said, by creating a fallback executor or something similar (we can discuss the design in the PR) so immediate task do never fill the BACKGROUND_EXECUTOR
causing deadlocksThese alone won’t solve Parse issues (see comment above), so
Task.setDefaultBackgroundExecutor()
to let everyone do their tweaking at their own riskI can PR if we reach some kind of agreement :-)
This follows parse-community/Parse-SDK-Android#646 where we found a possible bug in the executor, and would also propose some changes.
Bug?
It is told in comments that
This is strictly true but practically not with the blocking queue that is being used (
LinkedBlockingQueue
with infinite capacity). If I got this correctly, with that queue tasks can always be queued, so the max number of threads you’ll have is the core pool. maxPoolSize has no effect. So thisshould become
also official docs from
ThreadPoolExecutor
about queues:Queue strategy
Using the terminology from oracle site, the current strategy (despite the comments) is the unbounded queue, which fits the case of tasks completely independent of others. You will agree this is not
bolts
case. Can we move to the first strategy?I can read in comments that the bolts executor was designed trying to emulate the AsyncTask executor. But AsyncTasks are independent by design, can afford a big queue. Bolts
Task
are dependent, chained, nested, and the docs suggest a different strategy for this design. We are experiencing lockups in the Parse SDK that you can read in the linked issue (there’s a huge comment explaining the internals). We can block theTask.BACKGROUND_EXECUTOR
forever very easily, in situations like the following, when running concurrently the same action:With 2 processors, this takes 3 concurrent actions to have the executor hang. I don’t think this is fixable from outside bolts, because
BACKGROUND_EXECUTOR
and use another, because the background executor is the fallback executor ofImmediateExecutor
. You can mention a custom executor in every task call, but that’s bad performance wise because you lose the simplicity ofImmediateExecutor
and don’t reuse threads.I propose to move the queuing strategy towards the direct handoffs strategy:
Proposals:
or
Using a big queue would have no effect on lockups. If there are 7 core threads needing other 7 threads to complete (as in my example), and the queue can hold 128 commands, this won’t be solved. We would have to wait for 128 requests to hang before having the first resolved. Ideally,
VERY_SMALL_NUMBER < CORE_POOL_SIZE
to ensure at least a request is fulfilled.