elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.86k stars 344 forks source link

create public interface for user-defined "generic" components #2699

Open thesuperzapper opened 2 years ago

thesuperzapper commented 2 years ago

Generic components are powerful because the same component can be run both locally, and in a kubeflow/airflow pipeline. This makes it easier to develop iteratively by running the pipeline locally (rather than spamming your kubeflow/airflow cluster with jobs).

We can provide a public interface for people to define their own "generic" components in addition to the built-in ones we already have (Jupyter Notebook, Python Script, R Script).


Users could implement their "generic" components by implementing a Python class with methods like:

The available user-inputs (for generating the node-properties UI) could be defined by implementing "property" methods on this class.

We can then provide @xxxx decorators for each type of UI input we have ("dropdown", "list", "checkbox", etc). For example, @elyra.dropdown(options=["option_1","option_2"], default="option_1") would display a dropdown and would pass parameters like selected_option to the annotated method.


Here is a very rough implementation of a generic-component class with one dropdown input called greeting_text that simply runs a print() function:

import kfp
from elyra.pipeline.local.processor_local import OperationProcessor
from elyra.pipeline.pipeline import GenericOperation

class MyGenericComponent(ElyraGenericComponent):

    def run_on_local(self) -> OperationProcessor:

        # `CustomOperationProcessor` is a custom subclass of `OperationProcessor`
        class CustomOperationProcessor(OperationProcessor):
            def __init__(self, text_to_print: str):
                self.text_to_print = text_to_print
                super().__init__()
            def process(self, operation: GenericOperation, elyra_run_name: str):
                print(self.text_to_print)

        operation_processor = CustomOperationProcessor(
            text_to_print=self.greeting_text()
        )
        return operation_processor

    def run_on_kubeflow(self) -> kfp.dsl.ContainerOp:
        container_op_factory = kfp.components.create_component_from_func(
            func=lambda text_to_print: print(text_to_print),
            base_image='python:3.9'
        )
        container_op = container_op_factory(
            text_to_print=self.greeting_text()
        )
        return container_op

    def run_on_airflow(self) -> ElyraAirflowOperation:
        # `ElyraAirflowOperation` is a class that replaces the current dictionary we use to pass 
        # the list of operations for the "airflow_template.jinja2" template
        elyra_airflow_operation = ElyraAirflowOperation(
            class_name="airflow.operators.python.PythonOperator",
            component_params={"python_callable": f"lambda: print({self.greeting_text()})"}
        )
        return elyra_airflow_operation

    @elyra.dropdown(display_name="Greeting Text", options=["morning", "night"], default="morning")
    def greeting_text(self, selected_option: str) -> str:
        if selected_option == "morning":
            return "Good morning, World!"
        elif selected_option == "night":
            return "Good night, World!"
        else:
            assert False
thesuperzapper commented 2 years ago

@akchinSTC @ptitzler any thoughts on if the above proposal is acceptable?

I think this is a very useful feature and will really set Elyra apart as a "generic" abstraction for pipelines.

thesuperzapper commented 2 years ago

@akchinSTC I have added this to the 4.0.0 milestone.

A public interface for "generic components" is a very valuable feature that no other pipeline tool has, adding it would make Elyra a powerful high-level abstraction above Airflow, Kubeflow and Local-Python.

This is NOT to say that we must use the specific proposal above, just that we should consider how best to achieve user-provided "generic components" for the 4.0.0 release.

lresende commented 2 years ago

What would be a concrete example of a "bring your own generic component" that can't be exposed as either a script or a notebook?

The issue is that runtimes are an extension point, and we have already seen a few runtime implementations being done by users, and the "run_on_xxx" won't be very scalable.

lresende commented 2 years ago

Also, generic components will have to reinvent the new KFP APIs, and we might go away from it.