[RFC] Node schema for pipelines 2.0

bourdakos1 commented 3 years ago

Our current solution for handling palette and node properties needs to be overhauled in order to support pipeline 2.0 features.

Current solution

a hard-coded palette.json on the frontend with a fixed list of available nodes and their associated metadata
a hard-coded properties.json on the fronted with a single instance of CommonProperties properties that is assumed to be use for all nodes in the palette
validation of nodes is also hard-coded and doesn't use properties.json so if properties.json changes, the validation needs to be updated as well

Requirements

we should not assume a fixed set of nodes
we should have the ability to lock node types to a specific runtime
each node type should be able to have its own properties definition
runtimes should be able to add properties to nodes
we need to keep the ability of affiliating a filetype with a node type (this enables dragging and dropping files onto the pipeline)
we should support dynamic node labels where the node type can decide whether to show the file name a fixed label or something else as the label
we should support dynamic properties (properties can vary based on the node itself, not just the node type)

we might want a way to generate a component based on a url?

Proposal

Instead of the concept of a single "palette" and single "properties", introduce the idea of a "node schema" per node type. This collection of "node schema" will be used to generate a "palette" at runtime. A runtime should also be able to provide "properties" that are applied across all nodes or override specific operation properties (RFC #1520). Properties will initially only be validated for type, requiredness, regex, min/max and other various json schema specs. All other property validation should be done at submission time.

A "node schema" should contain all static metadata needed for a node:

op (the ID for a node type, ie execute-notebook-node / execute-python-node)
icon (svg that will show up on the node)
- we probably need the option to provide light, dark and high contrast versions of the icons
label (the default label, since this can be dynamic)
description (the description that could show up in the palette or when hovering on the node in the pipeline)
properties (RFC #1519, the fixed properties for the node. dynamic and runtime specific properties don't go here)

A "node schema" should also be able to specify that it is attached to a file:

type should be file for nodes that should attached to a file
extensions (example: [".yml", ".yaml"] valid extensions for the node. if more than one node spec requests a given extension we should add a dropdown property to manually choose the node's type or prompt the user to choose onDrop)
language (example: "python" language identifier, not required but useful if available. VS Code has an api to detect the language based on grammar instead of just extension. This will also improve language icon support)
(this will attach a property for filename)
(this will be the only field that can be validated for a file's existence at runtime)

A "node schema" should have a fixed "label" property so that the user can manually adjust the label. The winning label would follow the logic of: node.properties.label ?? node.properties.filename ?? node.label

Properties (RFC #1519)

The finalized properties of a node will be generated from:

filename (if fileNode is true)
label
static properties defined in "node schema"
dynamic node specific properties
runtime injected properties (RFC #1520)

Examples

POC for being able to drag and drop any arbitrary KFP yaml component onto a pipeline.

Node schema:

{
  op: "execute-kfp-component",
  type: "file",
  extensions: [".yaml", ".yml"],
  icon: "xxx",
  label: "KFP Component",
  description: "A KubeFlow Pipelines YAML component",
  properties: []
}

KFP Component yaml: (I am assuming inputs are what would be turned into properties?, not sure how outputs would be handled)

name: 'Deploy Model - Watson Machine Learning'
description: |
  Deploy stored model on Watson Machine Learning as a web service.
metadata:
  annotations: {platform: 'IBM Watson Machine Learning'}
inputs:
  - {name: model_uid,        description: 'Required. UID for the stored model on Watson Machine Learning'}
  - {name: model_name,       description: 'Required. Model Name on Watson Machine Learning'}
  - {name: scoring_payload,  description: 'Sample Payload file name in the object storage', default: ''}
  - {name: deployment_name,  description: 'Deployment Name on Watson Machine Learning', default: ''}
outputs:
  - {name: scoring_endpoint, description: 'Link to the deployed model web service'}
  - {name: model_uid,        description: 'UID for the stored model on Watson Machine Learning'}
implementation:
  container:
    image: docker.io/aipipeline/wml-deploy:latest
    command: ['python']
    args: [
      -u, /app/wml-deploy.py,
      --model-uid, {inputValue: model_uid},
      --model-name, {inputValue: model_name},
      --scoring-payload, {inputValue: scoring_payload},
      --deployment-name, {inputValue: deployment_name},
      --output-scoring-endpoint-path, {outputPath: scoring_endpoint},
      --output-model-uid-path, {outputPath: model_uid}
    ]

finalized properties based on above:

# injected filename
filename:
  type: string
  description: The path to the file.
  required: true

# injected label
label:
  type: string
  description: The label for the node.

# no static properties

# dynamic properties
model_uid:
  type: string
  description: Required. UID for the stored model on Watson Machine Learning
model_name:
  type: string
  description: Required. Model Name on Watson Machine Learning
scoring_payload:
  type: string
  description: Sample Payload file name in the object storage
  default: ""
deployment_name:
  type: string
  description: Deployment Name on Watson Machine Learning
  default: ""

# no runtime injected properties

akchinSTC commented 3 years ago

Question: Will the schema be used for other types of operations..e.g. fileNode=False …like an ethereal operation where a file might be too heavy….say a sleep operation op_type=sleep-node (weak example) or no-code type operations?

KFP Component yaml: (I am assuming inputs are what would be turned into properties? not sure how outputs would be handled)

The outputs, cant they be handled in the same way as the inputs and exposed as dynamic properties? I think these just end up being passed to the arg cmd driver per component?

Being more a visual person I crayola'ed this in ppt, hopefully, its accurate.

bourdakos1 commented 3 years ago

Being more a visual person I crayola'ed this in ppt, hopefully, its accurate.

Yea, that looks about right 😊 I think the only thing is that filename is not part of Node Schema it's more of an ?implied? property that only shows up if it is a file node (it will never be explicitly defined anywhere)

Question: Will the schema be used for other types of operations..e.g. fileNode=False …like an ethereal operation where a file might be too heavy….say a sleep operation op_type=sleep-node (weak example) or no-code type operations?

yes, all operations will be treated like this by default, "file nodes" are a special case. Also, I think I removed fileNode: true and changed it to type: "file" to future proof any other node types that might pop up in the future (I'll update it here too)

KFP Component yaml: (I am assuming inputs are what would be turned into properties? not sure how outputs would be handled)

The outputs, cant they be handled in the same way as the inputs and exposed as dynamic properties? I think these just end up being passed to the arg cmd driver per component?

Oh okay, makes sense, so outputs are just another set of properties? Do all kfp component yamls have a set of inputs and outputs or is this just what they happened to be named here?

elyra-ai / elyra