Open ghislainp opened 4 days ago
hi @ghislainp - thanks for the issue!
are you able to share your code that lead you to this stack trace?
in particular, this part catches my eye - it could be a problem with our dynamic importing:
File "/home/debian/prefect/.pixi/envs/default/lib/python3.11/site-packages/prefect/init.py", line 108, in getattr
return importlib.import_module(f".{attr_name}", package=name)
but it would be really useful to see some version of your code, since an example like this appears to work
from dask.array.random import random
from prefect import flow, task
@task
def create_dask_array():
x = random((1000, 1000), chunks=(100, 100))
return x
@task
def save_to_zarr(x, path: str):
x.to_zarr(path)
print(f"Dask array saved to {path}")
@flow
def dask_to_zarr_flow():
x = create_dask_array()
save_to_zarr(x, "output.zarr")
if __name__ == "__main__":
dask_to_zarr_flow()
While cooking a reproducible example, I found that there is no error when the flow is run from the command line as opposed to through a deployment. Surprisingly running from the command line, I got another error in a upstream flow which I didn't had before when running through a deployment. This new error is coming from calling dask LocalCluster within the flow to limit memory usage. I removed this call to LocalCluster and now I don't have any error anymore neither in the upstream flow nor in the flow that was causing the original reported error. I don't understand why I can't create a LocalCluster from within a flow, it is a practice that had been working.
I don't understand neither the interaction between remove the LocalCluster in the upstream snow and the downstream flow because they do not share any variable. The upstream one writes netcdf files that are read by the downstream one, and again I was able to run the downstream flow via command line with suggest (using the exactly same file as when I had an error before).
Could it be an interaction between the prefect flow and dask ?
Bug summary
This is a very strange bug. I'm just calling "x.to_zarr" where x is a dask array from a prefect task. The problem is that the raised Error is: "ModuleNotFoundError: No module named 'prefect.isnan'" when to_zarr tries to serialized something in the dask array. I can't determine from the traceback (attached below) if it is a prefect bugs, or a dask bug or an interaction between both. Note that I'm using to_zarr with success in other pat of the code (in the same environment). This error has been around for a while (since Spring) and I was hestiating to post it here... but I don't see another place to seek advice.
Version info (
prefect version
output)Additional context