VisionSystemsInc / terra

Terra - Run your algorithm anywhere on earth
MIT License
0 stars 3 forks source link

ProcessPoolExecutor deadlocks on on python 3.9 or newer #115

Open andyneff opened 2 years ago

andyneff commented 2 years ago

Starting in python 3.9.0, using a ProcessPoolExecutor has a good chance of deadlocking on a terra task. It's almost always happens with 10 workers, and is practically guaranteed with 16 workers.

I've managed to put together a piece of code to reproduce the error:

#!/usr/bin/env python

from concurrent.futures import ProcessPoolExecutor, as_completed

from celery import shared_task

@shared_task
def foo(x):
  return x*x

if __name__ == '__main__':
  futures = {}
  with ProcessPoolExecutor(max_workers=10) as executor:
    for x in range(10):
      futures[executor.submit(foo, x)] = x

    results = {}
    for future in as_completed(futures):
      task_id = futures[future]
      results[task_id] = future.result()
      print(len(results))

As you can see here, the bug is actually not part of terra, but can be reproduced just using the celery task object. Something about how celery works and a "task" vs a "function" is causing workers to hang before they ever process a single job.

andyneff commented 2 years ago

Diving into the problem further, it is very easy to offset the problem with print statements. Adding/removing them can hide or expose the problem, so it definitely seems like a timing affects the issue.


I've also replaced @shared_task with the following, and the problem persists, so leaving shared_task for easy of debugging:

app = Celery('tasks')
@app.task

Roughly speaking, celery creates a "task" out of your function, and then wraps that in a Proxy class:

# bar = shared_task(foo)
app = celery_state.get_current_app()
foo_task = app._task_from_fun(foo)
def thing():
  return foo_task
bar = Proxy(thing)