Closed kreczko closed 1 month ago
I'll work on this!
First question on behavior. Would we want this to run without direct effect to other running python processes? I.e., would the behavior be appropriate if the PythonOperator
(i.e. print(np.r_[0:5])
with provided aliases
) ran a BashOperator
with python as the target (i.e. python -c "import numpy as np;print(np.r_[0:5])
)
Not saying this would be the chosen implementation since isolation may not require spawning a new python process and doing whatever imports are needed, but to specify behavior. I'm not quite sure here with the docs in their previous state regarding this
For the PythonOperator we need
In both cases, you cannot have isolation easily → best to have the first implementation without isolation.
As you note, pycall
is not fully documented - it should note that it is a simple wrapper for normal python functions:
*args
and **kwargs
to the function if found/ raise error if not foundFor the PythonOperator we need
- [ ] ability to pass data between main process and the python function without additional de/serialization
- [x] ability to search for existing functions, defined as Python, either built-ins or via imports
- [ ] ability to search for custom functions, defined from string values in YAML
Above is my interpretation of those requirements. Is that a good starting place?
- Pass
*args
and**kwargs
to the function if found/ raise error if not found Should the above mentioned error raise when the operator is called or at initialization?
I generally subscribe to failing early. However, as a user, the declarative syntax allows users can organize their thoughts with their own structure and complete parsing (like into PythonOperator
) might not be possible for all types. This would affect parsing order so that object initialization can occur after all custom functions are defined.
might not be possible for all types
and that's OK. As we build up use-cases, we can adjust things.
This would affect parsing order so that object initialization can occur after all custom functions are defined.
You are right, parsing order is something that we will need to tackle. At the moment the YAML is parsed as a whole and processed sequentially - in the future (needs
keyword) this will not be the case.
For phase 0 I would not dwell too long on this - we will revisit it for phase 2
Sounds good regarding parsing. I'll just write as if anything processed from the YAML is defined/imported already in the YAML.
Just to make sure I've got this right, can you clarify the "main process" we'll need to share data with? What entry points should I consider for flows?
Just to make sure I've got this right, can you clarify the "main process" we'll need to share data with? What entry points should I consider for flows?
at this point, very simple: just the parameters and return values. I've over-engineered the previous exploration of this, so I am wary not to make it too complicated this time around (i.e. keep the provenance information separate for now).
This dropped off my radar before I got to it. Is this impeding any of your progress?
Hi @YeungOnion - as you can see by my response latency, there is no dev going on at the moment. The next planned push is in July. That said, the python operator is not blocking any planned development as it will be focused on the workflow language side.
8 only implements the
BashOperator
, but we will at the very least require aPythonOperator
to call python functions.It should: