Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
488 stars 194 forks source link

ModuleNotFoundError when structuring Parsl programs #2680

Open tueda opened 1 year ago

tueda commented 1 year ago

Describe the bug

I tried to run the example of Structuring Parsl programs in the user guide. Namely, I created the following 3 files:

config.py:

from parsl.config import Config
from parsl.channels import LocalChannel
from parsl.executors import HighThroughputExecutor
from parsl.providers import LocalProvider

htex_config = Config(
    executors=[
        HighThroughputExecutor(
            label="htex_local",
            cores_per_worker=1,
            provider=LocalProvider(
                channel=LocalChannel(),
            ),
        )
    ],
)

library.py:

from parsl import python_app

@python_app
def increment(x):
    return x + 1

run_increment.py:

import parsl
from config import htex_config
from library import increment

parsl.load(htex_config)

for i in range(5):
    print('{} + 1 = {}'.format(i, increment(i).result()))

and invoked

python3 run_increment.py

But it gave me a ModuleNotFoundError: No module named 'library' error as follows:

tueda@5cd66131c196:/home/shared/test1$ python3 run_increment.py
Traceback (most recent call last):
  File "/home/shared/test1/run_increment.py", line 8, in <module>
    print('{} + 1 = {}'.format(i, increment(i).result()))
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/dist-packages/parsl/dataflow/dflow.py", line 301, in handle_exec_update
    res = self._unwrap_remote_exception_wrapper(future)
  File "/usr/local/lib/python3.10/dist-packages/parsl/dataflow/dflow.py", line 565, in _unwrap_remote_exception_wrapper
    result = future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.10/dist-packages/parsl/executors/high_throughput/executor.py", line 450, in _queue_management_worker
    s.reraise()
  File "/usr/local/lib/python3.10/dist-packages/parsl/app/errors.py", line 122, in reraise
    reraise(t, v, v.__traceback__)
  File "/usr/local/lib/python3.10/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/local/bin/process_worker_pool.py", line 596, in worker
    result = execute_task(req['buffer'])
  File "/usr/local/bin/process_worker_pool.py", line 489, in execute_task
    f, args, kwargs = unpack_apply_message(bufs, user_ns, copy=False)
  File "/usr/local/lib/python3.10/dist-packages/parsl/serialize/facade.py", line 58, in unpack_apply_message
    return [deserialize(buf) for buf in unpack_buffers(packed_buffer)]
  File "/usr/local/lib/python3.10/dist-packages/parsl/serialize/facade.py", line 58, in <listcomp>
    return [deserialize(buf) for buf in unpack_buffers(packed_buffer)]
  File "/usr/local/lib/python3.10/dist-packages/parsl/serialize/facade.py", line 110, in deserialize
    result = methods_for_code[header].deserialize(payload)
  File "/usr/local/lib/python3.10/dist-packages/parsl/serialize/concretes.py", line 54, in deserialize
    data = dill.loads(chomped)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 286, in loads
    return load(file, ignore, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 272, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 419, in load
    obj = StockUnpickler.load(self)
  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 409, in find_class
    return StockUnpickler.find_class(self, module, name)
ModuleNotFoundError: No module named 'library'

So, I think the documentation may be incomplete/outdated or Parsl does not work as expected (or I have missed something).

To Reproduce

  1. Setup Parsl 2023.05.01 with Python 3.10.6 (I used a Docker image based on ubuntu:22.04).
  2. Run the above example.
  3. See the error.

Expected behavior The user guide says it should produce the following output:

0 + 1 = 1
1 + 1 = 2
2 + 1 = 3
3 + 1 = 4
4 + 1 = 5

Environment

Distributed Environment

RaphaelRobidas commented 5 months ago

I have the same problem. Any fix or workaround?

benclifford commented 5 months ago

@RaphaelRobidas usually that error:

  File "/usr/local/lib/python3.10/dist-packages/dill/_dill.py", line 409, in find_class
    return StockUnpickler.find_class(self, module, name)
ModuleNotFoundError: No module named 'library'

means that the code is trying to do the equivalent of import library in your worker environment, and it cannot find library.py due to however the worker environment is configured.

For example, in some situations, you need to put the directory that library.py is in onto your $PYTHONPATH in htex's worker_init parameter - or in some other make sure that in your worker environment you can successfully run this in your worker environment:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'library'
tueda commented 5 months ago

Maybe it would be nice for users if the worker's sys.path could be automatically adjusted in such a way that library (which can be seen from the main program) can be found by the worker (or copying/transferring the library file may be needed for remote workers).