elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.86k stars 344 forks source link

set environment variables from upstream outputs #2472

Open thesuperzapper opened 2 years ago

thesuperzapper commented 2 years ago

Background

After PR https://github.com/elyra-ai/elyra/pull/2350, users can now consume the outputPath's from parent nodes as inputValues, but this feature can't be used by the generic Notebook/Script nodes.

I think the most generic way of passing these outputs to the notebook is using environment variables.

UI Implementation

We would add something similar to the "Please select an output from a parent: " but to the environment variable setter.

Screen Shot 2022-02-10 at 21 18 16

Screen Shot 2022-02-10 at 21 17 16

Argo Implementation

We can set the env field from the parent's Argo parameters using {{steps.STEP_NAME.outputs.parameters.PARAM_NAME}}:

NOTE: this example actually passes an "input" using the method proposed in https://github.com/elyra-ai/elyra/issues/2471, rather than an "output"

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: output-parameter-
spec:
  entrypoint: lambda
  templates:
    - name: lambda
      dag:
        tasks:
          - name: parent-node
            template: parent-node

          - name: notebook-node
            template: notebook-node
            dependencies: [parent-node]
            arguments:
              parameters:
                - name: message
                  value: "{{steps.parent-node.outputs.parameters.message}}"

    - name: parent-node
      inputs:
        parameters:
          - name: message
      container:
        ...
      outputs:
        parameters:
          - name: message
            value: "{{inputs.parameters.message}}"

    - name: notebook-node
      inputs:
        parameters:
          - name: message
      container:
        args: [...]
        command: [...]
        image: "..."
        env:
          - name: MESSAGE_ENV_VAR
            value: "{{inputs.parameters.message}}"
ptitzler commented 2 years ago

Related: https://github.com/elyra-ai/elyra/issues/1843 (support global environment variables)

thesuperzapper commented 2 years ago

@ptitzler this is similar to https://github.com/elyra-ai/elyra/issues/1843, but I think this proposal is easier and more useful, as it allows actual integration of generic Notebooks/Scripts into pipeline flows.

Right now, Notebooks/Scripts kind of sit on their own, as they are inherently hardcoded, and cant update their behavior based on the outputs of the upstream.

thesuperzapper commented 2 years ago

@akchinSTC @ptitzler I really think we should prioritize this feature, as currently "notebooks" can't really be integrated into Kubeflow/Airflow pipelines in Elyra (without hard coding things like file paths).

Environment variables are probably the easiest way to "parameterize" a notebook, as a cell can contain:

import os
my_input = os.environ["MY_INPUT"]

If we let people set environment variables from upstream outputs, you could do things like chaining an "s3 download" node into a "train ML" Notebook, and parameterize the notebook with an "INPUT_DATA_PATH" environment variable that provides the path of the downloaded s3 data.

lresende commented 2 years ago

Not sure how relevant still is, but I had started to work on adding some support for parameters at https://github.com/lresende/elyra/commit/2a4b6328159ba17fbf1a16d2cd4c57a17a9b8c17