Open venkatajagannath opened 2 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
(Copying my message from elsewhere with minor edits for context)
It’s better to use setup and teardown because they handle errors better.
For the reported issue specifically, pre_execute
is actually executed every time a worker is started for the task, not only when execute
is called, despite the name. (This is also true for on_execution_callback
, by the way.) Resuming a deferred task starts a worker and causes pre_execute
to be called.
This is kind of by design, but I guess there’s a case to be made it’s not a good design. If we are to change this, I would recommend:
pre_execute
and on_execution_callback
to remove execute
from the name to avoid misunderstandings. Document them well to describe the actual behaviour.@uranusjr Does it make sense to rename this issue to "Rename pre_execute and on_execution_callback to remove execute from the name to avoid misunderstandings"?
@uranusjr
pre_execute() and post_execute() naming convention is very intuitive. Would it be possible to change the backend design instead?
I've used setup and teardown. Example here. But, my current usecase is slightly different.
Its a much better UX to use just one decorator to spin up/down a cluster and also execute the main job. See example here. We also hear the same feedback from customers.
One idea:
(Maybe?) we can extend setup/teardown to configure a method within a custom operator as a setup and another as a teardown. These methods will execute similar to setup/teardown in a DAG.
@uranusjr Does it make sense to rename this issue to "Rename pre_execute and on_execution_callback to remove execute from the name to avoid misunderstandings"?
I'm not sure whether we already decide to change it this way. Personally, I prefer 2 and 3. 1 might not help much for one who doesn't understand airflow in depth. Providing 2 and 3 will give the user a sense that these things are different
I guess the question is, is a hook that runs every time a worker starts useful for people? We need something for that if we change the semantic of pre_execute (and on_execution_hook).
I guess the question is, is a hook that runs every time a worker starts useful for people? We need something for that if we change the semantic of pre_execute (and on_execution_hook).
For operators works like S3KeySensor, I think it's possible 🤔
Apache Airflow version
2.10.0
If "Other Airflow 2 version" selected, which one?
No response
What happened?
I have a new provider to send jobs to a Ray cluster. One of the operators has pre_execute and post_execute operations. Here is the complete flow of control
When post_execute() starting executing it seems to be running pre_execute() code again. This is a bug. We should only see post_execute code. Logs attached.
post_execute_bug.txt
What you think should happen instead?
Only code relevant to the post_execute method must execute.
How to reproduce
You will need access to a k8 cluster and will need to provide connection details as shown in step 1
Operating System
linux
Versions of Apache Airflow Providers
No response
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct