PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.3k stars 1.59k forks source link

prefect deployment run --watch crashed while waiting but flow runs succesfully #15018

Open fredrikhgrelland opened 2 months ago

fredrikhgrelland commented 2 months ago

Bug summary

We are running prefect flows using the cli with the --wait option in our ci-builds. prefect deployment run --watch my_deployment --args some_args

After upgrading to prefect 2.20 we are experiencing intermittent, but fairly frequent, failures while waiting in the cli. The job runs successfully, but the ci process will fail with the following stack:

Watching flow run 'logical-squid'...
06:13:07.652 | INFO    | prefect - Flow run is in state 'Scheduled'
06:13:12.807 | INFO    | prefect - Flow run is in state 'Scheduled'
06:13:17.970 | INFO    | prefect - Flow run is in state 'Pending'
06:13:23.125 | INFO    | prefect - Flow run is in state 'Pending'
06:13:28.284 | INFO    | prefect - Flow run is in state 'Pending'
06:13:33.453 | INFO    | prefect - Flow run is in state 'Pending'
06:13:38.649 | INFO    | prefect - Flow run is in state 'Pending'
06:13:43.812 | INFO    | prefect - Flow run is in state 'Pending'
Traceback (most recent call last):
  File "/home/runner/_work/baseline/baseline/.venv/bin/prefect", line 8, in <module>
    sys.exit(app())
             ^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/main.py", line 309, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/core.py", line 723, in main
    return _main(
           ^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/core.py", line 193, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/main.py", line 692, in wrapper
    return callback(**use_params)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/cli/_utilities.py", line 42, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 311, in coroutine_wrapper
    return call()
           ^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 432, in __call__
    return self.result()
           ^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 179, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/_tool/Python/3.11.9/x64/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 389, in _run_async
    result = await coro
             ^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/cli/deployment.py", line 1058, in run
    finished_flow_run = await wait_for_flow_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/client/utilities.py", line 100, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/flow_runs.py", line 87, in wait_for_flow_run
    await anyio.sleep(poll_interval)
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/anyio/_core/_eventloop.py", line 87, in sleep
    return await get_async_backend().sleep(delay)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2078, in sleep
    await sleep(delay)
  File "/home/runner/_work/_tool/Python/3.11.9/x64/lib/python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError

Version info (prefect version output)

Version:             2.20.2
API version:         0.8.4
Python version:      3.11.9
Git commit:          51c3f290
Built:               Wed, Aug 14, 2024 11:27 AM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.37.2

Additional context

I see https://github.com/PrefectHQ/prefect/commit/067cbc6e1e7975e69de30509a46bc5f505e59cca mentioning asyncio.exceptions.CancelledError and https://github.com/PrefectHQ/prefect/pull/14599#issue-2406484240 stating that this error needs to be catched.

fredrikhgrelland commented 2 months ago

Adding some additional context. It looks like it is happening quite frequently, and I believe it to happen when transitioning from pending to running state.

21:07:28.509 | INFO    | prefect - Flow run is in state 'Scheduled'
21:07:33.675 | INFO    | prefect - Flow run is in state 'Scheduled'
21:07:38.837 | INFO    | prefect - Flow run is in state 'Pending'
21:07:44.033 | INFO    | prefect - Flow run is in state 'Pending'
21:07:49.193 | INFO    | prefect - Flow run is in state 'Pending'
21:07:54.366 | INFO    | prefect - Flow run is in state 'Pending'
21:07:59.530 | INFO    | prefect - Flow run is in state 'Pending'
21:08:04.686 | INFO    | prefect - Flow run is in state 'Pending'
21:08:09.876 | INFO    | prefect - Flow run is in state 'Pending'
21:08:15.028 | INFO    | prefect - Flow run is in state 'Pending'
21:08:20.182 | INFO    | prefect - Flow run is in state 'Pending'
21:08:25.339 | INFO    | prefect - Flow run is in state 'Pending'
21:08:30.488 | INFO    | prefect - Flow run is in state 'Pending'
21:08:35.635 | INFO    | prefect - Flow run is in state 'Pending'
21:08:40.826 | INFO    | prefect - Flow run is in state 'Pending'
21:08:45.995 | INFO    | prefect - Flow run is in state 'Pending'
21:08:51.149 | INFO    | prefect - Flow run is in state 'Pending'
21:08:56.300 | INFO    | prefect - Flow run is in state 'Pending'
21:09:01.468 | INFO    | prefect - Flow run is in state 'Pending'
21:09:06.639 | INFO    | prefect - Flow run is in state 'Pending'
21:09:11.813 | INFO    | prefect - Flow run is in state 'Pending'
21:09:16.963 | INFO    | prefect - Flow run is in state 'Pending'
21:09:22.116 | INFO    | prefect - Flow run is in state 'Pending'
21:09:27.271 | INFO    | prefect - Flow run is in state 'Pending'
21:09:32.422 | INFO    | prefect - Flow run is in state 'Pending'
21:09:37.583 | INFO    | prefect - Flow run is in state 'Pending'
Traceback (most recent call last):
  File "/home/runner/_work/baseline/baseline/.venv/bin/prefect", line 8, in <module>
    sys.exit(app())
             ^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/main.py", line 309, in __call__
    return get_command(self)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/core.py", line 723, in main
    return _main(
           ^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/core.py", line 193, in _main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/typer/main.py", line 692, in wrapper
    return callback(**use_params)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/cli/_utilities.py", line 42, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/utilities/asyncutils.py", line 311, in coroutine_wrapper
    return call()
           ^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 432, in __call__
    return self.result()
           ^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 318, in result
    return self.future.result(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 179, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/_tool/Python/3.11.9/x64/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 389, in _run_async
    result = await coro
             ^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/cli/deployment.py", line 1058, in run
    finished_flow_run = await wait_for_flow_run(
                        ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/client/utilities.py", line 100, in with_injected_client
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/prefect/flow_runs.py", line 87, in wait_for_flow_run
    await anyio.sleep(poll_interval)
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/anyio/_core/_eventloop.py", line 87, in sleep
    return await get_async_backend().sleep(delay)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/_work/baseline/baseline/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2078, in sleep
    await sleep(delay)
  File "/home/runner/_work/_tool/Python/3.11.9/x64/lib/python3.11/asyncio/tasks.py", line 649, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError
fredrikhgrelland commented 2 months ago

I am simply not able to reproduce this from scratch. I am seeing this error in many repositories, but if I change the commit slightly it does not fail anymore. :facepalm:. I do still consider this to be a bug, as it is crashing without any apparent reason. But it is probably hard to figure out the root cause.