PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.2k stars 1.58k forks source link

agent fails to get flow from storage-block with path set as relative path #9001

Open amitschang opened 1 year ago

amitschang commented 1 year ago

First check

Bug summary

When building a deployment setting path using either the --path argument or -sb {type}/{name}/{path} style the upload of files works but flow run fails in the agent with something like

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpslmsj_lfprefect/getstars.py'

This is the case for both s3 and a local-file-system block (which is accessible on the agent system). These fail in seemingly subtly different ways, for example the above is for s3, but for local what is not found appears to be a directory

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp9pq0n7nsprefect/subdir'

The local-file-system case is fixed by modifying the path field of the deployment manifest (below the "### DO NOT EDIT BELOW THIS LINE" - oh no!) to the absolute path of the storage block, for example my local storage block has /prefect-storage as the basepath, so I set path to /prefect-storage/subdir and the flow will download and run.

The same does not work for s3, I have not found a workaround for that. I should note the s3 base is s3://{bucket}/{basepath} and not directly under the bucket, however deploying a flow works as expected without the extra path argument.

Appologies if these two are not the same issue, the traces are different but same user-experience, and it is possible the cause is the same, for example the logic for constructing the path could be flawed in the same way and handled differently by the different filesystems.

Reproduction

For example, for block:

$ prefect blocks inspect local-file-system/flow-storage
                                                                                  local-file-system/flow-storage                                                                                  
┌──────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Block Type                                   │ Local File System                                                                                                                               │
│ Block id                                     │ 81db69c0-bc0d-4702-80dc-d5890135ec8e                                                                                                            │
├──────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ basepath                                     │ /prefect-storage                                                                                                                                │

And deployment created using:

prefect deployment build --name getstars -sb local-file-system/local/subdir getstars.py:runflow
prefect deployment apply runflow-deployment.yaml 

### Error

```python3
For local storage

Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/engine.py", line 277, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 277, in asyncnullcontext
    yield
  File "/usr/local/lib/python3.11/site-packages/prefect/client/utilities.py", line 40, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/deployments.py", line 194, in load_flow_from_flow_run
    await storage_block.get_directory(from_path=deployment.path, local_path=".")
  File "/usr/local/lib/python3.11/site-packages/prefect/filesystems.py", line 147, in get_directory
    copytree(from_path, local_path, dirs_exist_ok=True)
  File "/usr/local/lib/python3.11/shutil.py", line 559, in copytree
    with os.scandir(src) as itr:
         ^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp9pq0n7nsprefect/subdir'

And for s3:

Flow could not be retrieved from deployment.
Traceback (most recent call last):
  File "<frozen importlib._bootstrap_external>", line 936, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1073, in get_code
  File "<frozen importlib._bootstrap_external>", line 1130, in get_data
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpslmsj_lfprefect/getstars.py'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/engine.py", line 277, in retrieve_flow_then_begin_flow_run
    flow = await load_flow_from_flow_run(flow_run, client=client)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/contextlib.py", line 222, in __aexit__
    await self.gen.athrow(typ, value, traceback)
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 277, in asyncnullcontext
    yield
  File "/usr/local/lib/python3.11/site-packages/prefect/client/utilities.py", line 40, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/deployments.py", line 206, in load_flow_from_flow_run
    flow = await run_sync_in_worker_thread(load_flow_from_entrypoint, str(import_path))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 91, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/flows.py", line 809, in load_flow_from_entrypoint
    flow = import_object(entrypoint)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/importtools.py", line 201, in import_object
    module = load_script_as_module(script_path)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/importtools.py", line 164, in load_script_as_module
    raise ScriptError(user_exc=exc, path=path) from exc
prefect.exceptions.ScriptError: Script at 'getstars.py' encountered an exception: FileNotFoundError(2, 'No such file or directory')

### Versions

```Text
Version:             2.8.7
API version:         0.8.4
Python version:      3.11.1
Git commit:          a6d6c6fc
Built:               Thu, Mar 23, 2023 3:27 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.34.1

Additional context

No response

billpalombi commented 1 year ago

Thanks for submitting this @amitschang! I'm able to replicate for the local-file-system block.

amitschang commented 1 year ago

Thanks! For the local file system case at least, it seems that perhaps at https://github.com/PrefectHQ/prefect/blob/f5e1b10ffa0183f330356697fcaf17a8b7a63c0e/src/prefect/filesystems.py#L132-L135, the path needs to be added to the base, something like:

        if from_path is None:
             from_path = Path(self.basepath).expanduser().resolve()
        else:
             from_path = Path(self.basepath).joinpath(from_path).expanduser().resolve()

or so

cicdw commented 1 year ago

Relative path manipulation in deployments should be fully resolved with the new work on projects (will be released in beta with 2.9.1 tomorrow).

@amitschang I'm curious if our beta setup for projects would help make this easier to debug and manage; you can check out the initial documentation directly in GitHub here.

In this setup, the root of your project directory is always the working directory for your deployment runs. Once the release is out you can experiment with this using prefect project init --recipe local which I believe will capture your setup and allow you to customize it further.

didopop3 commented 1 year ago

I have the exactly same issue with S3 block