elyra-ai / elyra

Elyra extends JupyterLab with an AI centric approach.
https://elyra.readthedocs.io/en/stable/
Apache License 2.0
1.86k stars 343 forks source link

Notebook pipeline specification v3 appears to be incomplete and evolving #665

Closed ptitzler closed 4 years ago

ptitzler commented 4 years ago

Looking at several pipelines that different users created over time and http://api.dataplatform.ibm.com/schemas/common-pipeline/pipeline-flow/pipeline-flow-v3-schema.json it appears that the v3 specification doesn't define all properties yet. I've noticed, for example, that in one v3 pipeline app_data contains a property artifact and in another v3 pipeline a property filename, which seem to carry the same semantic meaning.

As a developer who might want to build some additional tooling around these files I can't really do that unless the specification is complete and semantic versioning is used to make it possible to interpret the content in a predictable manner.

Example snippets are below.

An older pipeline uses the artifacts property (see last line):

{
    "doc_type": "pipeline",
    "version": "3.0",
    "json_schema": "http://api.dataplatform.ibm.com/schemas/common-pipeline/pipeline-flow/pipeline-flow-v3-schema.json",
    "id": "337277de-6c7e-4cde-bba8-4a09a2fbdd2d",
    "primary_pipeline": "d80dce55-0aea-4062-9b78-a620ac6f12cc",
    "pipelines": [
        {
            "id": "d80dce55-0aea-4062-9b78-a620ac6f12cc",
            "nodes": [
                {
                    "id": "404fc2a7-8fde-4e45-9652-55bcd230916b",
                    "type": "execution_node",
                    "op": "execute-notebook-node",
                    "app_data": {
                        "artifact": "watson-studio-gallery-dax/watson-studio-gallery-dax-weather-project/notebooks/Part 1 - Data Cleaning.ipynb",

Another "more current" pipeline uses the filename property instead.

{
  "doc_type": "pipeline",
  "version": "3.0",
  "json_schema": "http://api.dataplatform.ibm.com/schemas/common-pipeline/pipeline-flow/pipeline-flow-v3-schema.json",
  "id": "74a91164-bb6b-4bb7-9029-28525127eb0a",
  "primary_pipeline": "06cb6361-181a-436a-a0e0-13d0b6aa8c2c",
  "pipelines": [
    {
      "id": "06cb6361-181a-436a-a0e0-13d0b6aa8c2c",
      "nodes": [
        {
          "id": "0d2e36ca-1bb1-4b68-b975-d5f296c5beb5",
          "type": "execution_node",
          "op": "execute-notebook-node",
          "app_data": {
            "filename": "Part 1 - Data Cleaning.ipynb",

I did notice that app_data_def defines "additionalProperties": true but that doesn't help resolve the ambiguity. If version 3.0 is still evolving we might want to consider marking it as "alpha/beta" to warn consumers.

If I've missed something or misinterpreted what I've seen please let me know. Thanks!

lresende commented 4 years ago

Everything that is under app_data is application-specific, and we are wrapping up a PR that introducesElyra pipeline version at #662

ptitzler commented 4 years ago

Sounds good. Will there be a schema definition document provided as part of the PR or will that be handled elsewhere?

lresende commented 4 years ago

Added enhancement request #956