PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.39k stars 1.64k forks source link

Metadata files generated with RayTaskRunner #16009

Open dqueruel-fy opened 15 hours ago

dqueruel-fy commented 15 hours ago

Bug summary

Issue description

I don't know if it's a bug or a desired behavior but some metadata files are generated each time I run my flows locally. That's annoying because the files are generated in my source directory (or from where I run the flows/tasks). I'd like to have more info, please, on what these files are and if we can generate it somewhere else or, ideally, not generate them at all.

It generates files with filenames like 89e55eaee58e8ce3567e87801196d9d5 in the same folder that I call the python script (see below) with the following content:

{
    "metadata": {
        "storage_key": "/Users/<path to my local source dir>/89e55eaee58e8ce3567e87801196d9d5",
        "expiration": null,
        "serializer": {
            "type": "pickle",
            "picklelib": "cloudpickle",
            "picklelib_version": null
        },
        "prefect_version": "3.1.2",
        "storage_block_id": null
    },
    "result": "gAVLAS4=\n"
}

The minimal reproducible python script is

from prefect import flow, task
from prefect_ray import RayTaskRunner

@task(log_prints=True, persist_result=True)
def taskA():
    print("Task A")
    return 1

@flow(log_prints=True, persist_result=True, task_runner=RayTaskRunner)
def myFlow():
    print("In my flow")
    taskA.submit().wait()
    return 0

myFlow()

Version info

Version:             3.1.2
API version:         0.8.4
Python version:      3.11.9
Git commit:          02b99f0a
Built:               Tue, Nov 12, 2024 1:38 PM
OS/Arch:             darwin/arm64
Profile:             local
Server type:         server
Pydantic version:    2.8.2
Integrations:
  prefect-ray:       0.4.2

Additional context

Some notes:

cicdw commented 15 hours ago

Hey @dqueruel-fy - those files are a consequence of persisting task and flow results.

I've tried to change the server config's PREFECT_LOCAL_STORAGE_PATH to /tmp/result but it didn't help

This setting has an effect at workflow runtime and therefore setting it on the server will have no effect (all server configuration is prefixed with PREFECT_SERVER_). If you set this setting within the process that your workflows execute you should see the desired behavior.

For more information, check out the documentation on results and settings:

zzstoatzz commented 15 hours ago

hi @dqueruel-fy - yes this sounds like expected behavior, that metadata is your serialized result

» PREFECT_LOCAL_STORAGE_PATH=/tmp/result ipython

In [1]: from prefect import task

In [2]: @task(persist_result=True)
   ...: def f():
   ...:     return 42
   ...:

In [3]: f()
16:35:23.491 | INFO    | Task run 'f' - Finished in state Completed()
Out[3]: 42

In [4]: !ls /tmp/result
109c10d275731f842f4b08dd51b397aa

when you say

I've tried to change the server config's PREFECT_LOCAL_STORAGE_PATH to /tmp/result but it didn't help

... was about to type the same as @cicdw above, nevermind 🙂