PrefectHQ / prefect-ray

Prefect integrations with Ray
https://prefecthq.github.io/prefect-ray/
Apache License 2.0
63 stars 5 forks source link

Circular import error upon deployment #33

Closed ahuang11 closed 1 year ago

ahuang11 commented 2 years ago
(begin_task_run pid=8142)   File "/home/ray/anaconda3/lib/python3.8/site-packages/prefect/software/python.py", line 8, in <module>
(begin_task_run pid=8142)     from prefect.software.pip import PipRequirement, current_environment_requirements
(begin_task_run pid=8142)   File "/home/ray/anaconda3/lib/python3.8/site-packages/prefect/software/pip.py", line 4, in <module>
(begin_task_run pid=8142)     import packaging.requirements
(begin_task_run pid=8142)   File "/home/ray/anaconda3/lib/python3.8/site-packages/prefect/packaging/__init__.py", line 1, in <module>
(begin_task_run pid=8142)     from prefect.packaging.docker import DockerPackager
(begin_task_run pid=8142) ImportError: cannot import name 'DockerPackager' from partially initialized module 'prefect.packaging.docker' (most likely due to a circular import) (/home/ray/anaconda3/lib/python3.8/site-packages/prefect/packaging/docker.py)

The issue is that in a deployment, ray includes /home/ray/anaconda3/lib/python3.8/site-packages/prefect/ in its sys.path, and in this directory, there’s a packaging directory which confuses the from packaging import version statement (it tries to pull from prefect.packaging rather than the pypi packaging).

To workaround this issue, override sys.argv[0] = str(Path.cwd()) in prefect_ray/task_runners.py

zanieb commented 2 years ago

Ugh the classic standard library doesn't have a dedicated namespace issue. What do you mean re this workaround?

ahuang11 commented 2 years ago

In ray.workers module:

    if mode == SCRIPT_MODE:
        # Add the directory containing the script that is running to the Python
        # paths of the workers. Also add the current directory. Note that this
        # assumes that the directory structures on the machines in the clusters
        # are the same.
        # When using an interactive shell, there is no script directory.
        if not interactive_mode:
            script_directory = os.path.abspath(os.path.dirname(sys.argv[0]))
            worker.run_function_on_all_workers(
                lambda worker_info: sys.path.insert(1, script_directory)
            )

The reason why this doesn't happen running flows locally vs deployments is because when Prefect runs deployed code, it executes the script through .../lib/python3.8/site-packages/prefect/engine.py and the root directory is .../lib/python3.8/site-packages/prefect where packaging also lives.

At the moment, working with the Ray OSS team to resolve this.

zanieb commented 2 years ago

Really interesting. Let me know if you need anything.

jjyao commented 2 years ago

How does engine.py execute the Ray script?

ahuang11 commented 2 years ago

python -m prefect.engine <UUID>

zanieb commented 2 years ago

python -m prefect.engine <UUID> is the entrypoint to Prefect flow runs. The flow run process then calls ray.init() (see the RayTaskRunner implementation here) and task runs are submitted to the Ray cluster.

jjyao commented 2 years ago

The reason why we have this behavior is because https://docs.python.org/3/library/sys.html#sys.path

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of [PYTHONPATH](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH).

Currently there is no easy way to disable this behavior besides the workaround Andrew mentioned.

jjyao commented 2 years ago

I think https://blog.cykerway.com/posts/2020/12/15/python-name-conflict-with-built-in-module.html is essentially the problem we are facing here.

zanieb commented 2 years ago

We're not running a script though, we're using python -m <module>. I'm not sure why Ray needs to modify the path in this case.

For example:

# example.py
import sys
print(sys.path[0])

Displays the path of the script

❯ python src/prefect/example.py
/Users/mz/dev/prefect/src/prefect

Displays my current working directory

❯ python -m prefect.example
/Users/mz/dev/prefect
jjyao commented 2 years ago

Oh, got it. Then I feel it's a bug on our side, we diverge from the default python behavior.

jjyao commented 2 years ago

Could you check to see if https://github.com/ray-project/ray/pull/28043 fixes the issue?

ahuang11 commented 2 years ago

Neat! It works! Thanks @jjyao for the quick fix!

10:27:39.736 | INFO    | prefect.agent - Submitting flow run '6d01c2cd-0af1-4f55-91b6-9349b524e3f3'
10:27:40.393 | INFO    | prefect.infrastructure.process - Opening process 'cornflower-puma'...
10:27:40.399 | INFO    | prefect.agent - Completed submission of flow run '6d01c2cd-0af1-4f55-91b6-9349b524e3f3'
10:27:41.785 | WARNING | ray.worker - Failed to set SIGTERM handler, processes mightnot be cleaned up properly on exit.
10:27:42.029 | INFO    | prefect.task_runner.ray - Creating a local Ray instance
2022-08-22 10:27:43,661 INFO services.py:1470 -- View the Ray dashboard at http://127.0.0.1:8265
10:27:45.034 | INFO    | prefect.task_runner.ray - Using Ray cluster with 1 nodes.
10:27:45.034 | INFO    | prefect.task_runner.ray - The Ray UI is available at 127.0.0.1:8265
10:27:46.189 | INFO    | Flow run 'cornflower-puma' - Created task run 'say_hello-0b28e502-0' for task 'say_hello'
10:27:46.196 | INFO    | Flow run 'cornflower-puma' - Submitted task run 'say_hello-0b28e502-0' for execution.
10:27:50.279 | INFO    | Flow run 'cornflower-puma' - Finished in state Completed('All states completed.')
(begin_task_run pid=29639) hello Ford
10:27:50.447 | INFO    | prefect.infrastructure.process - Process 'cornflower-puma' exited cleanly.

If possible, let us know when you merge and release so we can pin that specific version!

jjyao commented 2 years ago

The PR is merged so it should be available in the next Ray release (2.1) or nightly wheel.

ahuang11 commented 2 years ago

Exciting news! Will close this once ray 2.1 is released and we pin 2.1 in our requirements

ahuang11 commented 1 year ago

2.1 has been released; closing this now.