AnswerDotAI / nbdev

Create delightful software with Jupyter Notebooks
Apache License 2.0
4.94k stars 492 forks source link

nbdev_prepare and nbdev_test hang if I use the parallel library with loky as the backend #1365

Open Taytay opened 1 year ago

Taytay commented 1 year ago

(First, thanks for nbdev. This project is great!)

Minimally reproducible example

This code in my notebook causes nbdev_test to hang indefinitely:

with parallel_backend("loky"):
    def g(y):
        return y + 1

    Parallel(n_jobs=2)(delayed(g)(y) for y in [1, 2, 3, 4])

But specifying "threading" as the backend works:

with parallel_backend("threading"):
    def g(y):
        return y + 1

    Parallel(n_jobs=2)(delayed(g)(y) for y in [1, 2, 3, 4])

The other thing that allows this to work is passing --n_workers=0 to nbdev_test. Note that "loky" is the default backend, so not specifying a backend when using parallel also fails.

This is a Mac M1 I'm on Python 3.9.18 nbdev 2.3.12 fastcore 1.5.29 Tried upgrading loky and joblib to 3.4.1 and 1.3.2 just to make sure that wasn't the issue. (It wasn't)

It's clearly related to the use of parallel in nbdev_test, but that's as far as I got:

(I'm posting this here in case others are using Parallel and are stymied when nbdev_test (or nbdev_prepare) stops working. If said people stumble across this, note that setting the parallel backend to threading makes your code MUCH slower due to the Python GIL).

Taytay commented 1 year ago

I just tried setting prefer="processes" in my Parallel instantiation:

        parallel_backend_name = "loky"
        with parallel_backend(parallel_backend_name):
            def g(y):
                return y + 1

            Parallel(n_jobs=2, timeout=1, prefer="processes")(delayed(g)(y) for y in [1, 2, 3, 4])

And now I get:

Traceback (most recent call last):
  File "<some_folder>/.conda-env/bin/nbdev_prepare", line 8, in <module>
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/", line 119, in _f
    return tfunc(**merge(args, args_from_prog(func, xtra)))
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/nbdev/", line 257, in prepare
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/nbdev/", line 89, in nbdev_test
    results = parallel(test_nb, files, skip_flags=skip_flags, force_flags=force_flags, n_workers=n_workers,
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/", line 117, in parallel
    return L(r)
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/", line 98, in __call__
    return super().__call__(x, *args, **kwargs)
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/", line 106, in __init__
    items = listify(items, *rest, use_list=use_list, match=match)
  File "<some_folder>/.conda-env/lib/python3.9/site-packages/fastcore/", line 66, in listify
    elif is_iter(o): res = list(o)
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/", line 562, in _chain_from_iterable_of_lists
    for element in iterable:
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/", line 609, in result_iterator
    yield fs.pop().result()
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/", line 446, in result
    return self.__get_result()
  File "<some_folder>/.conda-env/lib/python3.9/concurrent/futures/", line 391, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

That's related to : #693 #731 and #673, and #1256 I think. This might very well be a dupe, but #673 made it sound like it was solved. If I add OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES nbdev_prepare, it goes back to hanging instead of throwing an exception.