PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.78k stars 1.54k forks source link

Removing flow_run clears everything except task_run_state_cache #8444

Open PiotrSiejda opened 1 year ago

PiotrSiejda commented 1 year ago

First check

Bug summary

Sending delete to the flow_run (or task_run) nicelly cascades along all the tables except for task_run_state_cache. Most likely the reason for that is that in: https://github.com/PrefectHQ/prefect/blob/main/src/prefect/orion/database/orm_models.py in class ORMTaskRunStateCache There is line: task_run_state_id = sa.Column(UUID(), nullable=False) Which most likely should have been: task_run_state_id = sa.Column(UUID(), nullable=False, ondelete='cascade')

The cache is later on is not reachable so it is either way disregarded on consecutive runs.

This is related to: https://github.com/PrefectHQ/prefect/issues/8239

Reproduction

# Start with empty db.
# Create flow run with caching, ie:

from prefect import flow, task
from prefect.tasks import task_input_hash

@task(cache_key_fn=task_input_hash)
def task1(in_str:str):
    return in_str+'bb'

@flow
def ff()
    task1.submit("qqq")

if __name__ == "__main__":
    ff()

import requests
requests.delete("http://localhost:4200/api/flow_runs/UUID_OF_OUR_RUN")

look into the db. Everything is emptied except for task_run_state_cache

Error

Table task_run_state_cache is not emptied.

Versions

Version:             2.7.7
API version:         0.8.4
Python version:      3.9.13
Git commit:          e8ca30b8
Built:               Fri, Jan 6, 2023 4:25 PM
OS/Arch:             win32/AMD64
Profile:             default
Server type:         ephemeral
Server:
  Database:          sqlite
  SQLite version:    3.37.2

Additional context

No response

serinamarie commented 1 year ago

Hi @PiotrSiejda, thanks for the issue!

The cache is later on is not reachable so it is either way disregarded on consecutive runs.

Can you clarify if you run into an error/issue on consecutive runs? If not, what would be the desired behavior?

PiotrSiejda commented 1 year ago

No I have never run into error because of this. The only problem is that if if you have a lot of flow_runs, you would like to delete old ones to keep the db smaller. The problem here is that not everything is deleted cleanly so the db slowly grows in size. In my opinion in this situation all the records related to the flow_run should be removed.