kreczko commented 9 months ago

8 only implements the `BashOperator`, but we will at the very least require a `PythonOperator` to call python functions.

It should:

check if the function exists
allow for aliasing of name spaces (e.g. "aliases: {'numpy': 'np'}" in the YAML file
ability to inspect the function for input and output types (important for provenance and debugging)

YeungOnion commented 9 months ago

I'll work on this!

YeungOnion commented 9 months ago

First question on behavior. Would we want this to run without direct effect to other running python processes? I.e., would the behavior be appropriate if the PythonOperator (i.e. print(np.r_[0:5]) with provided aliases) ran a BashOperator with python as the target (i.e. python -c "import numpy as np;print(np.r_[0:5]))

Not saying this would be the chosen implementation since isolation may not require spawning a new python process and doing whatever imports are needed, but to specify behavior. I'm not quite sure here with the docs in their previous state regarding this

kreczko commented 9 months ago

For the PythonOperator we need

ability to pass data between main process and the python function
ability to search for existing and custom functions

In both cases, you cannot have isolation easily → best to have the first implementation without isolation.

As you note, pycall is not fully documented - it should note that it is a simple wrapper for normal python functions:

check if function exists
check if function is within SUPPORTED_FUNCTIONS (old implementation)
Pass *args and **kwargs to the function if found/ raise error if not found

YeungOnion commented 9 months ago

For the PythonOperator we need

[ ] ability to pass data between main process and the python function without additional de/serialization

[x] ability to search for existing functions, defined as Python, either built-ins or via imports

[ ] ability to search for custom functions, defined from string values in YAML

Above is my interpretation of those requirements. Is that a good starting place?

Pass *args and **kwargs to the function if found/ raise error if not found Should the above mentioned error raise when the operator is called or at initialization?

I generally subscribe to failing early. However, as a user, the declarative syntax allows users can organize their thoughts with their own structure and complete parsing (like into PythonOperator) might not be possible for all types. This would affect parsing order so that object initialization can occur after all custom functions are defined.

kreczko commented 8 months ago

might not be possible for all types

and that's OK. As we build up use-cases, we can adjust things.

This would affect parsing order so that object initialization can occur after all custom functions are defined.

You are right, parsing order is something that we will need to tackle. At the moment the YAML is parsed as a whole and processed sequentially - in the future (needs keyword) this will not be the case. For phase 0 I would not dwell too long on this - we will revisit it for phase 2

YeungOnion commented 8 months ago

Sounds good regarding parsing. I'll just write as if anything processed from the YAML is defined/imported already in the YAML.

Just to make sure I've got this right, can you clarify the "main process" we'll need to share data with? What entry points should I consider for flows?

kreczko commented 8 months ago

Just to make sure I've got this right, can you clarify the "main process" we'll need to share data with? What entry points should I consider for flows?

at this point, very simple: just the parameters and return values. I've over-engineered the previous exploration of this, so I am wary not to make it too complicated this time around (i.e. keep the provenance information separate for now).

YeungOnion commented 7 months ago

This dropped off my radar before I got to it. Is this impeding any of your progress?

kreczko commented 6 months ago

Hi @YeungOnion - as you can see by my response latency, there is no dev going on at the moment. The next planned push is in July. That said, the python operator is not blocking any planned development as it will be focused on the workflow language side.

FAST-HEP / fasthep-flow

Python Operator #12

8 only implements the `BashOperator`, but we will at the very least require a `PythonOperator` to call python functions.

FAST-HEP / fasthep-flow

Python Operator #12

8 only implements the BashOperator, but we will at the very least require a PythonOperator to call python functions.

8 only implements the `BashOperator`, but we will at the very least require a `PythonOperator` to call python functions.