Closed josh-gree closed 1 year ago
Hey @ josh-gree, the delay might be related to the result persistence for your tasks. Can you set persist_result=False
on your tasks and see if you still see a large delay between tasks?
Possible regression related to https://github.com/PrefectHQ/prefect/issues/7065
I did some digging into this. Not surprisingly the biggest time is create_task_run_then_submit
split between create_task_run
and submit_task_run
.
quote()
on the parameter totally reduces the submit time, although the task creation time still scales relative to the size of the parameter. For example, if the json size was doubled to 100_000 create task run will take 14s
(theres some extra prints in the source here to show how long create/submit take specifcially)
Output:
(demo-flows) ➜ demo-flows git:(main) ✗ python big_flows.py
11:52:17.884 | INFO | prefect.engine - Created flow run 'remarkable-urchin' for flow 'my-flow'
11:52:17.890 | INFO | Flow run 'remarkable-urchin' - View at https://app.prefect.cloud/account/3cf6b38f-5244-474a-9554-302144506e43/workspace/ce8b1412-01b7-4700-a508-8dbd1f43f623/flow-runs/flow-run/8329c2f0-6f84-4532-92b5-abb29b97e49f
No `quote()`:
11:53:58.932 | INFO | Flow run 'remarkable-urchin' - Created task run 'my_task-0' for task 'my_task'
Create Task Run: 6.912395715713501
11:53:58.940 | INFO | Flow run 'remarkable-urchin' - Executing 'my_task-0' immediately...
11:54:13.146 | INFO | Task run 'my_task-0' - Finished in state Completed()
Submit Task Run: 14.208080768585205
Total task time: 21.138995885849
With `quote()`:
11:54:20.471 | INFO | Flow run 'remarkable-urchin' - Created task run 'my_task-1' for task 'my_task'
Create Task Run: 7.3107006549835205
11:54:20.473 | INFO | Flow run 'remarkable-urchin' - Executing 'my_task-1' immediately...
11:54:21.018 | INFO | Task run 'my_task-1' - Finished in state Completed()
Submit Task Run: 0.5463552474975586
Total task time: 7.868835687637329
11:54:21.163 | INFO | Flow run 'remarkable-urchin' - Finished in state Completed('All states completed.')
Repro script:
import pandas as pd
import json
from prefect.utilities.annotations import quote
from typing import Dict, List
from prefect import flow, task
from faker import Faker
import time
def make_a_giant_json(N: int):
Faker.seed(42)
fake = Faker()
out = fake.json(
data_columns={
"Spec": "@1.0.1",
"ID": "pyint",
"x1": "address",
"x2": "address",
"x3": "address",
"x4": "address",
"x5": "address",
"x6": "address",
"x7": "address",
"x8": "address",
"x9": "address",
"x10": "address",
},
num_rows=N,
)
x = json.loads(out)
return x
@task
def my_task(big_json):
pass
@flow
def my_flow(n: int):
big_json = make_a_giant_json(n)
print("No `quote()`:")
s = time.time()
my_task(big_json)
e = time.time()
print("Total task time:", e - s)
print("With `quote()`:")
s = time.time()
my_task(quote(big_json))
e = time.time()
print("Total task time:", e - s)
if __name__ == '__main__':
my_flow(n=50000)
Thanks for the responses here - much appreciated - I have, as suggested, tried to use the quote
annotation and in my real world case it does not reduce the issue particularly - I am currently just running all my flows without tasks - they run sub 2 mins in total now in comparison to 10+ mins using tasks - this will probably just have to do for now which is unfortunate :-(
I will keep an eye here to see if any developments.
@desertaxle - I cannot try this today but will get back to you tomorrow and let you know if this helps - I did assume that persistance is opt in anyway right? The UI of the runs does seem to suggest unpersisted results...
Hi @zhen0 - what details do you need from me? It seems that @jakekaplan has a clear repro above?
@josh-gree - I added it for this line "I cannot try this today but will get back to you tomorrow and let you know if this helps" - I am guessing it didn't help!
Adding to our backlog as it looks like this is still an issue.
First check
Bug summary
When running flows on Kubernetes - I am seeing muti-minute delays between a task ending and the next task starting - this delay seems to be proportional to the size of the data being passed between tasks. With thhe following flow I am able to reproduce this issue consitently;
This flow consits of two pairs of tasks that can pass varying sizes of data between them
The logs for a run of this flow;
Logs
``` 12:51:23.925 | DEBUG | prefect.profiles - Using profile 'default' /usr/local/lib/python3.9/runpy.py:127: RuntimeWarning: 'prefect.engine' found in sys.modules after import of package 'prefect', but prior to execution of 'prefect.engine'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) 12:51:23.977 | DEBUG | MainThread | prefect._internal.concurrency - WaiterNB: Please do let me know if I should redact any of the above logs?
The parts of the logs that display this issue are - the first pair of tasks pass 10 json records between them;
There is less than a second gap between the end of the first task and the start if the next.
The second pair of tasks pass 500k records between them;
In this case there is 2.5 min gap between the end of one task and the start of the next...
Since I am running on GKE autopilot this 2.5mins of deadtime is extremely costly across many flows - as things stand my only solution is to completely dispense with the use of tasks at all and just wrap vanilla python functions in a flow - this doesn't feel right!
Reproduction
Error
Versions
Additional context
Manifests used for the agent in the cluster;
Dockerfile that pod runs;
Pyproject.toml of the project installed in the container;