PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.34k stars 1.59k forks source link

Flow crashed #15234

Open lsscodes opened 1 month ago

lsscodes commented 1 month ago

Bug summary

i was trying to run a deployment that pulled code from private GITLAB repo and run it with prefect:managed work-pool and my runs are exited with "message": "Flow run process exited with non-zero status code 1."

here is the log

{
  "id": "a6f1fee7-0c1d-4421-9b7f-d0ff55ee4170",
  "account": "4663cbab-9cad-4423-8c93-6c068ccaf0ff",
  "event": "prefect.flow-run.Crashed",
  "occurred": "2024-09-05T09:04:31.330Z",
  "payload": {
    "intended": {
      "from": "PENDING",
      "to": "CRASHED"
    },
    "initial_state": {
      "type": "PENDING",
      "name": "Pending"
    },
    "validated_state": {
      "type": "CRASHED",
      "name": "Crashed",
      "message": "Flow run process exited with non-zero status code 1."
    }
  },
  "received": "2024-09-05T09:04:31.526Z",
  "related": [
    {
      "prefect.resource.id": "prefect.flow.bcec10bc-dfe9-45a5-9ffc-6eb2a224f81d",
      "prefect.resource.role": "flow",
      "prefect.resource.name": "im30-autosubscribe"
    },
    {
      "prefect.resource.id": "prefect.deployment.3c1b4964-e86d-43e2-bc53-3dab3958845f",
      "prefect.resource.role": "deployment",
      "prefect.resource.name": "lss-autosubscribe"
    },
    {
      "prefect.resource.id": "prefect.work-queue.d9988900-12fd-4461-911e-6ca171643846",
      "prefect.resource.role": "work-queue",
      "prefect.resource.name": "default"
    },
    {
      "prefect.resource.id": "prefect.work-pool.6f35e1f0-c4c2-45c0-bce1-79369849b578",
      "prefect.resource.role": "work-pool",
      "prefect.resource.name": "lss-pool"
    },
    {
      "prefect.resource.id": "prefect.tag.auto-scheduled",
      "prefect.resource.role": "tag"
    },
    {
      "prefect.resource.id": "prefect.schedule.56c9a831-cc93-4b59-b1d9-a8f3aa414a6b",
      "prefect.resource.role": "creator",
      "prefect.resource.name": "CronSchedule"
    }
  ],
  "resource": {
    "prefect.resource.id": "prefect.flow-run.fc1eeee1-1467-4d11-a667-6bb9afedf666",
    "prefect.resource.name": "delicate-crane",
    "prefect.state-message": "Flow run process exited with non-zero status code 1.",
    "prefect.state-name": "Crashed",
    "prefect.state-timestamp": "2024-09-05T09:04:31.330514+00:00",
    "prefect.state-type": "CRASHED"
  },
  "workspace": "344a6efd-a119-4e21-80f4-fd2e69f87449"
}

Here are contents of my prefect.yaml file

# Welcome to your prefect.yaml file! You can use this file for storing and managing
# configuration for deploying your flows. We recommend committing this file to source
# control along with your flow code.

# Generic metadata about this project
name: im30_rewards_autosubscriber_prefect
prefect-version: 3.0.0

# build section allows you to manage and build docker images
build: null

# push section allows you to manage if and how this project is uploaded to remote locations
push: null

# pull section allows you to provide instructions for cloning this project in remote locations

pull:
    - prefect.deployments.steps.git_clone:
        repository: https://gitlab.com/xxxx/im30_rewards_autosubscriber_prefect.git
        credentials: "{{ prefect.blocks.gitlab-credentials.lss-gitlab }}"

# the deployments section allows you to provide configuration for deploying flows
deployments:
- name: lss-autosubscribe
  version: null
  tags: []
  concurrency_limit: null
  description: null
  entrypoint: im30_autosubscriber_prefect.py:im30_autosubscribe
  parameters: {}
  work_pool:
    name: lss-pool
    work_queue_name: null
    job_variables: {}
  enforce_parameter_schema: true
  schedules:
  - cron: '*/2 * * * *'
    timezone: Europe/Oslo
    day_or: true
    active: true
    max_active_runs: null
    catchup: false

Version info (prefect version output)

Version:             3.0.0
API version:         0.8.4
Python version:      3.11.9
Git commit:          c40d069d
Built:               Tue, Sep 3, 2024 11:13 AM
OS/Arch:             win32/AMD64
Profile:             default
Server type:         cloud
Pydantic version:    2.8.2
Integrations:
  prefect-gitlab:    0.3.0

Additional context

No response

desertaxle commented 1 month ago

Thanks for the bug report @lsscodes!

A crashed flow run with a non-zero exit code usually means that Prefect ran into an error when pulling your flow code. You can verify the configuration of your pull step by running this script:

from prefect.deployments.steps import git_clone
from prefect_gitlab import GitLabCredentials

git_clone(
    repository="https://gitlab.com/xxxx/im30_rewards_autosubscriber_prefect.git",
    credentials=GitLabCredentials.load("lss-gitlab"),
)

If that runs without error, then we'll need to dig deeper!

lsscodes commented 1 month ago

@desertaxle i tried the following code

from prefect.deployments.steps import git_clone
from prefect_gitlab import GitLabCredentials

credentials = GitLabCredentials.load("lss-gitlab")

print(credentials)

git_clone(
    repository="https://gitlab.com/xxx2074845/im30_rewards_autosubscriber_prefect.git",
    credentials=credentials,
)

and it returned

GitLabCredentials(token=SecretStr('**********'), url=None)
C:\Users\xxx\OneDrive - xxxGroup\Desktop\Work\Tools\im30_rewards_autosubscriber_prefect\junk\prefect_gitlab_clone.py:8: RuntimeWarning: coroutine 'git_clone' was never awaited
  git_clone(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Still processing items: 1 items remaining...

nothing happened, I don't see the repository getting cloned locally

desertaxle commented 1 month ago

Hmmm, ok it looks like we'll need to make sure we wait for that coroutine:

import asyncio
from prefect.deployments.steps import git_clone
from prefect_gitlab import GitLabCredentials

async def main():
    await git_clone(
         repository="https://gitlab.com/xxxx/im30_rewards_autosubscriber_prefect.git",
        credentials=await GitLabCredentials.load("lss-gitlab"),
    )

if __name__ == "__main__":
    asyncio.run(main())

I thought that function had our @sync_compatible decorator on it, which would allow it to be run synchronously, but I must've been mistaken. Hopefully the new code gives a better indicator!

lsscodes commented 1 month ago

@desertaxle still nothing happened

RuntimeWarning: coroutine 'git_clone' was never awaited
  git_clone(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

RuntimeWarning: coroutine 'sync_compatible.<locals>.coroutine_wrapper.<locals>.ctx_call' was never awaited
  git_clone(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
desertaxle commented 1 month ago

That's my mistake, I forgot to await the git_clone code. If you add an await before git_clone then it should execute correctly. I've updated my code example above.

lsscodes commented 1 month ago

@desertaxle new error

TypeError: cannot pickle 'coroutine' object
sys:1: RuntimeWarning: coroutine 'sync_compatible.<locals>.coroutine_wrapper.<locals>.ctx_call' was never awaited
zzstoatzz commented 1 month ago

hi @lsscodes its probably because you're not awaiting the load method, on all blocks, that method is async in an async context

In [1]: from prefect.deployments.steps import git_clone

In [2]: await git_clone(repository="https://github.com/zzstoatzz/prefect-monorepo.git")
Out[2]: {'directory': 'prefect-monorepo'}

In [3]: from prefect.blocks.core import Block

In [4]: await Block.load("json/marvin-thread-cache")
Out[4]: JSON(value={...})
lsscodes commented 1 month ago

@zzstoatzz @desertaxle I was able to get it working. thanks for the help

did the following change

pull:
  - prefect.deployments.steps.git_clone:
      repository: https://gitlab.com/xxxxx/im30_rewards_autosubscriber_prefect.git
      access_token: "{{ prefect.blocks.secret.lss-gitlab-token }}"

I also want to point out that even though the the deployment was successful, there still warning you might want to look into

 RuntimeWarning: coroutine 'sync_compatible.<locals>.coroutine_wrapper.<locals>.ctx_call' was never awaited
  if flow is None:
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

sys:1: RuntimeWarning: coroutine 'sync_compatible.<locals>.coroutine_wrapper.<locals>.ctx_call' was never awaited
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Version:             3.0.1
API version:         0.8.4
Python version:      3.11.9
Git commit:          c6b2ffe1
Built:               Fri, Sep 6, 2024 10:05 AM
OS/Arch:             win32/AMD64
Profile:             ephemeral
Server type:         cloud
Pydantic version:    2.8.2
Integrations:
  prefect-gitlab:    0.3.0
Markowashere commented 1 month ago

I have the same bug in Prefect 3. Can't run a Hello World flow in Prefect:managed workpool when the code is pulled from Gitlab. Logs output is just:

Opening process...

Process for flow run 'xxxxxxx' exited with status code: 1

Reported flow run 'xxxxxx' as crashed: Flow run process exited with non-zero status code 1.

Downgrading the managed workpool image to Prefect 2 fixes the issue.

desertaxle commented 1 month ago

I'm glad to hear you got it working, @lsscodes! Was the fix to use the access_token field instead of the credentials field?

@Markowashere can you share your pull step to see if it's a similar issue?

lsscodes commented 1 month ago

@desertaxle that is correct, access_token did the trick by the way, have you looked at the RuntimeWarnings?

Markowashere commented 1 month ago

I'm glad to hear you got it working, @lsscodes! Was the fix to use the access_token field instead of the credentials field?

@Markowashere can you share your pull step to see if it's a similar issue?

Sure, here's the pull step:

pull:
- prefect.deployments.steps.git_clone:
    repository: git@gitlab.com:private/repo.git
    branch: feature_branch
    access_token: '{{ prefect.blocks.secret.deployment-test-my-flow-repo-token
      }}'

I've also tried to deploy using Python, same issue.

bonsuot commented 5 days ago

Hi all, I am having a similiar issue. Please advise

my yaml

# prefect.yaml
name: fantasy-premier-league
prefect-version:

build:

push:

pull:
- prefect.deployments.steps.git_clone:
    repository: https://github.com/bonsuot/fantasy-premier-league.git

deployments:
- name: fantasy-premier-league
  version:
  tags: [fpl_etl, database, api]
  concurrency_limit:
  description: Main flow orchestrating the entire ETL pipeline
  entrypoint: fpl_etl.py:main_flow
  parameters: {"mode":"auto"}
  work_pool:
    name: fpl-pool
    work_queue_name:
    job_variables: {}
  enforce_parameter_schema: true
  schedules:
  - cron: 0 0 12 * * * # Every 12 hours
    timezone: UTC
    day_or: true
    active: true
    max_active_runs:
    catchup: false

logs

Opening process...
10:10:49 AM
Info
Process for flow run 'prehistoric-coua' exited with status code: 1
10:10:53 AM
Error
Reported flow run 'bccb264d-7e2c-4f36-a056-eaf8741d6166' as crashed: Flow run process exited with non-zero status code 1.