elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.86k stars 343 forks source link

[RFC] Node schema for pipelines 2.0 #1444

Open bourdakos1 opened 3 years ago

bourdakos1 commented 3 years ago

Our current solution for handling palette and node properties needs to be overhauled in order to support pipeline 2.0 features.

Current solution

Requirements

we might want a way to generate a component based on a url?

Proposal

Instead of the concept of a single "palette" and single "properties", introduce the idea of a "node schema" per node type. This collection of "node schema" will be used to generate a "palette" at runtime. A runtime should also be able to provide "properties" that are applied across all nodes or override specific operation properties (RFC #1520). Properties will initially only be validated for type, requiredness, regex, min/max and other various json schema specs. All other property validation should be done at submission time.

A "node schema" should contain all static metadata needed for a node:

A "node schema" should also be able to specify that it is attached to a file:

A "node schema" should have a fixed "label" property so that the user can manually adjust the label. The winning label would follow the logic of: node.properties.label ?? node.properties.filename ?? node.label

Properties (RFC #1519)

The finalized properties of a node will be generated from:

Examples

POC for being able to drag and drop any arbitrary KFP yaml component onto a pipeline.

Node schema:

{
  op: "execute-kfp-component",
  type: "file",
  extensions: [".yaml", ".yml"],
  icon: "xxx",
  label: "KFP Component",
  description: "A KubeFlow Pipelines YAML component",
  properties: []
}

KFP Component yaml: (I am assuming inputs are what would be turned into properties?, not sure how outputs would be handled)

name: 'Deploy Model - Watson Machine Learning'
description: |
  Deploy stored model on Watson Machine Learning as a web service.
metadata:
  annotations: {platform: 'IBM Watson Machine Learning'}
inputs:
  - {name: model_uid,        description: 'Required. UID for the stored model on Watson Machine Learning'}
  - {name: model_name,       description: 'Required. Model Name on Watson Machine Learning'}
  - {name: scoring_payload,  description: 'Sample Payload file name in the object storage', default: ''}
  - {name: deployment_name,  description: 'Deployment Name on Watson Machine Learning', default: ''}
outputs:
  - {name: scoring_endpoint, description: 'Link to the deployed model web service'}
  - {name: model_uid,        description: 'UID for the stored model on Watson Machine Learning'}
implementation:
  container:
    image: docker.io/aipipeline/wml-deploy:latest
    command: ['python']
    args: [
      -u, /app/wml-deploy.py,
      --model-uid, {inputValue: model_uid},
      --model-name, {inputValue: model_name},
      --scoring-payload, {inputValue: scoring_payload},
      --deployment-name, {inputValue: deployment_name},
      --output-scoring-endpoint-path, {outputPath: scoring_endpoint},
      --output-model-uid-path, {outputPath: model_uid}
    ]

finalized properties based on above:

# injected filename
filename:
  type: string
  description: The path to the file.
  required: true

# injected label
label:
  type: string
  description: The label for the node.

# no static properties

# dynamic properties
model_uid:
  type: string
  description: Required. UID for the stored model on Watson Machine Learning
model_name:
  type: string
  description: Required. Model Name on Watson Machine Learning
scoring_payload:
  type: string
  description: Sample Payload file name in the object storage
  default: ""
deployment_name:
  type: string
  description: Deployment Name on Watson Machine Learning
  default: ""

# no runtime injected properties
akchinSTC commented 3 years ago

Question: Will the schema be used for other types of operations..e.g. fileNode=False …like an ethereal operation where a file might be too heavy….say a sleep operation op_type=sleep-node (weak example) or no-code type operations?

KFP Component yaml: (I am assuming inputs are what would be turned into properties? not sure how outputs would be handled)

The outputs, cant they be handled in the same way as the inputs and exposed as dynamic properties? I think these just end up being passed to the arg cmd driver per component?

Being more a visual person I crayola'ed this in ppt, hopefully, its accurate. image

bourdakos1 commented 3 years ago

Being more a visual person I crayola'ed this in ppt, hopefully, its accurate.

Yea, that looks about right 😊 I think the only thing is that filename is not part of Node Schema it's more of an ?implied? property that only shows up if it is a file node (it will never be explicitly defined anywhere)

Question: Will the schema be used for other types of operations..e.g. fileNode=False …like an ethereal operation where a file might be too heavy….say a sleep operation op_type=sleep-node (weak example) or no-code type operations?

yes, all operations will be treated like this by default, "file nodes" are a special case. Also, I think I removed fileNode: true and changed it to type: "file" to future proof any other node types that might pop up in the future (I'll update it here too)

KFP Component yaml: (I am assuming inputs are what would be turned into properties? not sure how outputs would be handled)

The outputs, cant they be handled in the same way as the inputs and exposed as dynamic properties? I think these just end up being passed to the arg cmd driver per component?

Oh okay, makes sense, so outputs are just another set of properties? Do all kfp component yamls have a set of inputs and outputs or is this just what they happened to be named here?