PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.97k stars 1.57k forks source link

Ability to create/update from_source() based deployment without flow dependencies installed #14464

Closed robfreedy closed 2 months ago

robfreedy commented 3 months ago

First check

Prefect Version

3.x

Describe the current behavior

In the 2.19.3 release, users have the ability to run the prefect deploy command to create a deployment for a flow without having to have the flow's dependencies installed in the environment.

However, when deploying a flow using the from_source().deploy() method of deploying a flow, the dependencies for the flow need to be present.

Describe the proposed behavior

Ability to use from_source() and .deploy() to create/update a deployment for a flow without having the flow's dependencies in the environment that the from_source() and .deploy() are run in.

Example Use

Being able to run the deployment code below in an environment that does not have the flows dependencies installed (i.e. pandas in the example below).

Deployment:

from prefect import flow

if __name__ == "__main__":
    flow.from_source(
        source="s3://robs-test-deployment-bucket",
        entrypoint="test.py:hello_flow_rob",
    ).deploy(
        name="robs-deployment-from-s3",
        work_pool_name="robs-test-process-pool",
    )

Flow:

from prefect import flow, task, get_run_logger
import pandas as pd

@task
def hello():
    logger = get_run_logger()
    test = pd.array([1, 2, 3])
    logger.info("Hello world!")
    logger.info(test)

@flow
def hello_flow_rob():
    hello()

if __name__ == "__main__":
    hello_flow_rob()

Additional context

Stack Trace from running with 2.19.7 and latest 3.x release:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/var/folders/y2/pwgwqk_91ms0sg3lhcxklwfw0000gn/T/tmphyuibc64/robs-test-deployment-bucket/test.py", line 2, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/rob.freedy/Documents/prefect/scratch/from_source_repro/deploy.py", line 4, in <module>
    flow.from_source(
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 304, in coroutine_wrapper
    return call()
           ^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 432, in __call__
    return self.result()
           ^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 179, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 389, in _run_async
    result = await coro
             ^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/flows.py", line 931, in from_source
    flow: "Flow" = await from_async.wait_for_call_in_new_thread(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/api.py", line 164, in wait_for_call_in_new_thread
    return call.result()
           ^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 179, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.12/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/_internal/concurrency/calls.py", line 352, in _run_sync
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/flows.py", line 1675, in load_flow_from_entrypoint
    flow = import_object(entrypoint)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/utilities/importtools.py", line 205, in import_object
    module = load_script_as_module(script_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/rob.freedy/Documents/prefect/scratch/.venv/lib/python3.12/site-packages/prefect/utilities/importtools.py", line 168, in load_script_as_module
    raise ScriptError(user_exc=exc, path=path) from exc
prefect.exceptions.ScriptError: Script at '/var/folders/y2/pwgwqk_91ms0sg3lhcxklwfw0000gn/T/tmphyuibc64/robs-test-deployment-bucket/test.py' encountered an exception: ModuleNotFoundError("No module named 'pandas'")
serinamarie commented 3 months ago

MRE

my_script.py

from prefect import flow

if __name__ == "__main__":
    flow.from_source(
        source="https://github.com/PrefectHQ/hello-projects.git",
        entrypoint="flows/pandas_flow.py:pandas_flow",
    ).deploy(
        name="pandas-deployment-from-github",
        work_pool_name="local-pool",  # use the name of an existing pool
    )