Task.BACKGROUND_EXECUTOR is badly defined and prone to lockups

natario1 commented 7 years ago

This follows parse-community/Parse-SDK-Android#646 where we found a possible bug in the executor, and would also propose some changes.

Bug?

It is told in comments that

Creates a proper Cached Thread Pool. Tasks will reuse cached threads if available or create new threads until the core pool is full. tasks will then be queued. If an task cannot be queued, a new thread will be created unless this would exceed max pool size, then the task will be rejected. Threads will time out after 1 second.

This is strictly true but practically not with the blocking queue that is being used (LinkedBlockingQueue with infinite capacity). If I got this correctly, with that queue tasks can always be queued, so the max number of threads you’ll have is the core pool. maxPoolSize has no effect. So this

ThreadPoolExecutor executor =  new ThreadPoolExecutor(
        CORE_POOL_SIZE,
        MAX_POOL_SIZE,
        KEEP_ALIVE_TIME, TimeUnit.SECONDS,
        new LinkedBlockingQueue<Runnable>());

should become

ThreadPoolExecutor executor =  new ThreadPoolExecutor(
        CORE_POOL_SIZE,
        MAX_POOL_SIZE,
        KEEP_ALIVE_TIME, TimeUnit.SECONDS,
        new LinkedBlockingQueue<Runnable>(SOME_NUMBER));

also official docs from ThreadPoolExecutor about queues:

Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution.

Queue strategy

Using the terminology from oracle site, the current strategy (despite the comments) is the unbounded queue, which fits the case of tasks completely independent of others. You will agree this is not bolts case. Can we move to the first strategy?

I can read in comments that the bolts executor was designed trying to emulate the AsyncTask executor. But AsyncTasks are independent by design, can afford a big queue. Bolts Task are dependent, chained, nested, and the docs suggest a different strategy for this design. We are experiencing lockups in the Parse SDK that you can read in the linked issue (there’s a huge comment explaining the internals). We can block the Task.BACKGROUND_EXECUTOR forever very easily, in situations like the following, when running concurrently the same action:

BACKGROUND_EXECUTOR pool (max 7 threads for 6 processors)
|- background-executor-thread-1 (needs another background thread to complete)
|- background-executor-thread-2 (needs another background thread to complete)
|- background-executor-thread-3 (needs another background thread to complete)
|- background-executor-thread-4 (needs another background thread to complete)
|- background-executor-thread-5 (needs another background thread to complete)
|- background-executor-thread-6 (needs another background thread to complete)
|- background-executor-thread-7 (needs another background thread to complete)

With 2 processors, this takes 3 concurrent actions to have the executor hang. I don’t think this is fixable from outside bolts, because

Tasks promote chaining, nesting and dependencies and that’s how you build upon them
It’s impossible to get rid of the BACKGROUND_EXECUTOR and use another, because the background executor is the fallback executor of ImmediateExecutor. You can mention a custom executor in every task call, but that’s bad performance wise because you lose the simplicity of ImmediateExecutor and don’t reuse threads.

I propose to move the queuing strategy towards the direct handoffs strategy:

Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups when handling sets of requests that might have internal dependencies.

Proposals:

// direct handoff with limited max pool size
// this fits better bolts design
ThreadPoolExecutor executor =  new ThreadPoolExecutor(
        CORE_POOL_SIZE,
        BIG_NUMBER, // 64?
        KEEP_ALIVE_TIME, TimeUnit.SECONDS,
        new SynchronousQueue<Runnable>());

or

// bounded queue with a small value
ThreadPoolExecutor executor =  new ThreadPoolExecutor(
        CORE_POOL_SIZE,
        BIG_NUMBER, // 32?
        KEEP_ALIVE_TIME, TimeUnit.SECONDS,
        new LinkedBlockingQueue<Runnable>(VERY_SMALL_NUMBER));

Using a big queue would have no effect on lockups. If there are 7 core threads needing other 7 threads to complete (as in my example), and the queue can hold 128 commands, this won’t be solved. We would have to wait for 128 requests to hang before having the first resolved. Ideally, VERY_SMALL_NUMBER < CORE_POOL_SIZE to ensure at least a request is fulfilled.

Jawnnypoo commented 7 years ago

@grantland Are you still managing this project? Or can you point us to the right people who are?

grantland commented 7 years ago

I think I'm the only somewhat active maintainer. This repo hasn't seen much activity since it's been pretty stable and most of the TPL spec has already been implemented. I also seem to have missed a few PRs (probably happened while my current team at FB was on lockdown for a project), which I just addressed. If there happens to be more activity here I wouldn't mind having some help 😛

This was a while ago, but I think this specific ThreadPool functionality was added because the Parse-Android SDK at the time (before Bolts-Android) used the default Executors#newCachedThreadPool() and we were running into a ton of deadlocks. While I was trying to hunt them down, I noticed we were spinning up an enormous amount of threads and traced that to the fact that Executors#newCachedThreadPool() is unbounded. This is fine for the traditional Java usage on servers with virtually unbounded CPUs, memory, and power, but it could negatively affect Android devices as they're much more limited by CPU power, memory, and battery life, in addition to the fact that lower end Android devices supported by our minSdkVersion don't have many CPU cores.

Once I limited the ThreadPool to a manageable size, I was able to untangle a bunch of non-executor related deadlocks (mainly Object#lock(), synchronized, etc) and things were running smoothly. Any deadlocks that we encountered after this we were able to either solve by untangling some bad code or de-nest our Tasks (as ImmediateExecutor would bump deeply nested Tasks to Task.BACKGROUND_EXECUTOR as you mentioned).

Your solution would solve the problem you're facing, but I'm a little hesitant to increase the max pool size of Task.BACKGROUND_EXECUTOR/Task.callInBackground(callable) to 32 or 64 as we'd potentially be chocking the CPU in a lot of applications using those methods.

Additionally, I think we can split up your issue into two parts to find a solution:

Deadlocks occurring when using Task.BACKGROUND_EXECUTOR directly
Deadlocks occurring when using Task.BACKGROUND_EXECUTOR indirectly via ImmediateExecutor and Task#call(callable)

In my experience, 1 rarely happens and when it does it either uncovers some badly written code in which cleaning it fixes the deadlock or that code requires a specific non-Task.BACKGROUND_EXECUTOR executor.

I see 2 being a much more important issue as nested Tasks can be randomly bumped to a different thread and we are unable to predetermine how deep in the stack our Task will be executed.

With this, I think the least invasive solution would be to add another Executor that ImmediateExecutor bumps to. This Executor can either be bounded or unbounded and I don't think it really matters as it's not called directly. This way, we don't choke the CPU for calls that use Task.BACKGROUND_EXECUTOR, but we also don't cause impossible-to-trace deadlocks due to automatic bumping to a different thread when the call stack gets too deep.

Do you think this solution would be good enough to solve the issue? Do you think there is anything I'm missing?

natario1 commented 7 years ago

Thanks @grantland ! So, I think we can agree on the following, correct me if I am wrong:

bound the queue of BACKGROUND_EXECUTOR (e.g. to 128 like async task), or edit the comments and note that max pool size is unused. It won’t have any effect, but I don’t think you wanted it to be unbounded if I got this correctly.
add a thread factory to BACKGROUND_EXECUTOR just to have a meaningful name, that would really help in debugging for example

solve the ImmediateExecutor issue as you said. If we add the thread factory, we could as well solve it this way, don’t know if it’s better:


// BoltsExecutors
private static boolean isBackgroundThread(Thread thread) {
    return thread.getName().startsWith(“bolts-background-thread-”);
}

// ImmediateExecutor
@Override
public void execute(Runnable command) {
  int depth = incrementDepth();
  try {
    // Don’t request a new background thread if this is already a background thread
    if (depth <= MAX_DEPTH || isBackgroundThread(Thread.getCurrentThread())) {
      command.run();
    } else {
      BoltsExecutors.background().execute(command);
    }
  } finally {
    decrementDepth();
  }
}

This is good news, but I think the parse SDK is still stuck at your first point, deadlocks occurring when using Task.BACKGROUND_EXECUTOR directly. You know that in the middle of each request Parse ends up either in ParseSQLiteDatabase.dbExecutor or in ParseRequest.NETWORK_EXECUTOR or in both. And then it explicitly jumps out of them. This means that a single execution needs 2 background threads, one waiting for the other. If the current pool has 5, it takes 3 concurrent executions and the SDK is locked.

What can we do?

Another solution off the top of my head would be to leave everything as is but have a small queue (< core pool). Then some commands might be rejected because max pool size is small as well. Instead of canceling the task, we could schedule it for later using RejectedExecutionHandler. This ensures infinite capacity without stressing resources. But yeah, it looks bad.

@Override
public void rejectedExecution(final Runnable runnable, final ThreadPoolExecutor e) {
    Task.delay(1000).then(new Continuation() {
        e.execute(runnable);
    });
}

natario1 commented 7 years ago

@grantland any comments?

Based on what you said my plan would be

bound the queue of BACKGROUND_EXECUTOR (e.g. to 128 like async task), or edit the comments, either one is wrong
solve the ImmediateExecutor issue as you said, by creating a fallback executor or something similar (we can discuss the design in the PR) so immediate task do never fill the BACKGROUND_EXECUTOR causing deadlocks

These alone won’t solve Parse issues (see comment above), so

(proposal) add static Task.setDefaultBackgroundExecutor() to let everyone do their tweaking at their own risk

I can PR if we reach some kind of agreement :-)

BoltsFramework / Bolts-Android

Task.BACKGROUND_EXECUTOR is badly defined and prone to lockups #130

Bug?

Queue strategy