Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

Process specifications incomplete #17

Closed m-mohr closed 4 years ago

m-mohr commented 4 years ago

I just tried to use the back-end and figured out that the processes are often missing schemas or the parameter_order although they have them in the official definitions.

save_result for example looks like this in this back-end:

{
      "description": "Save processed data to storage or export to http.",
      "id": "save_result",
      "name": "save_result",
      "parameters": {
        "data": {
          "description": "The data to save.",
          "required": true,
          "schema": {}
        },
        "format": {
          "description": "The file format to save to. It must be one of the values that the server reports as supported output formats, which usually correspond to the short GDAL/OGR codes. This parameter is case insensitive.",
          "required": true,
          "schema": {}
        },
        "options": {
          "description": "The file format options to be used to create the file(s). Must correspond to the options that the server reports as supported options for the chosen format. The option names and valid values usually correspond to the GDAL/OGR format options.",
          "required": true,
          "schema": {}
        }
      },
      "returns": {
        "description": "Raster Data Cube",
        "schema": {
          "format": "raster-cube",
          "type": "object"
        }
      }
    }

The official definition looks like this:

{
  "id": "save_result",
  "summary": "Save processed data to storage",
  "description": "Saves processed data to the local user workspace / data store of the authenticated user. This process aims to be compatible to GDAL/OGR formats and options. STAC-compatible metadata should be stored with the processed data.\n\nCalling this process may be rejected by back-ends in the context of secondary web services.",
  "categories": [
    "cubes",
    "export"
  ],
  "parameter_order": [
    "data",
    "format",
    "options"
  ],
  "parameters": {
    "data": {
      "description": "The data to save.",
      "schema": {
        "anyOf": [
          {
            "type": "object",
            "format": "raster-cube"
          },
          {
            "type": "object",
            "format": "vector-cube"
          }
        ]
      },
      "required": true
    },
    "format": {
      "description": "The file format to save to. It must be one of the values that the server reports as supported output formats, which usually correspond to the short GDAL/OGR codes. This parameter is *case insensitive*.",
      "schema": {
        "type": "string",
        "format": "output-format"
      },
      "required": true
    },
    "options": {
      "description": "The file format options to be used to create the file(s). Must correspond to the options that the server reports as supported options for the chosen `format`. The option names and valid values usually correspond to the GDAL/OGR format options.",
      "schema": {
        "type": "object",
        "format": "output-format-options",
        "default": {}
      }
    }
  },
  "returns": {
    "description": "`false` if saving failed, `true` otherwise.",
    "schema": {
      "type": "boolean"
    }
  },
  "links": [
    {
      "rel": "about",
      "href": "https://www.gdal.org/formats_list.html",
      "title": "GDAL Raster Formats"
    },
    {
      "rel": "about",
      "href": "https://www.gdal.org/ogr_formats.html",
      "title": "OGR Vector Formats"
    }
  ]
}

Why have they been changed? The schema and parameter order are important for clients to automatically generate methods. In the Web Editor for example the VITO back-end is much less user-friendly by default: image

Would you respond with the full process specification, it would look like this by default: image

jdries commented 4 years ago

That's not intentional, I actually didn't even know about 'parameter_order'. We'll add the extra metadata.

soxofaan commented 4 years ago

fixed by https://github.com/Open-EO/openeo-python-driver/pull/14

@jdries I don't have push rights to openeo-python-driver yet, so please merge :)

also and probably related: I tried to move this ticket #17 to openeo-python-driver project, but I couldn't: Screenshot from 2019-08-29 15-38-14

m-mohr commented 4 years ago

When can we expect these changes to be available at http://openeo.vgt.vito.be/openeo/0.4.0 ? I'm looking into using the VITO back-end for a workshop/hackathon on Friday...

jdries commented 4 years ago

The parameter order is already there, but perhaps the schema's were not part of the fix. @soxofaan Could you have a look to see if it's possible to also add the schema information? (Although this is probably not as easy as the order...)

m-mohr commented 4 years ago

Yes, indeed I can see parameter_order and some schemas. Why don't you just output the original (or modified) JSON files? There are more fields missing that were originally available, summary for example.

soxofaan commented 4 years ago

sorry, I misread the part about the schema's and only covered the parameter order part.

It will indeed be quite some more work to fix the missing schemas.

soxofaan commented 4 years ago

at the moment we don't reuse the JSON files, but a hardcoded different representation which is indeed behind the current spec.

It's probably better to think of a way to reuse the "official" JSON spec files.

That being said, if you want results before/by Friday, it's probably more feasible to quick-fix the schemas for a small selection of processes. @m-mohr do you have a list of the processes you want to see fixed, or it's necessary to fix them all?

m-mohr commented 4 years ago

I'm trying to run the Phenology/EVI process graph in the Web Editor. You can try it yourself. Login at http://editor.openeo.org and connect to VITO, then paste the EVI process graph into the Process Graph Editor and switch back to the Visual Builder. There it runs the validation and fails.

At the moment the parser complains about the missing parameters for the callbacks in reduce, apply etc. More may come up later as the parser aborts after the first issue.

Example (see the parameters fields in the schemas for the parameter reducer):

{
  "id": "reduce",
  "summary": "Reduce dimensions",
  "description": "Applies a reducer to a data cube dimension by collapsing all the input values along the specified dimension into an output value computed by the reducer.\n\nThe reducer must be a callable process (or a set of processes as process graph) that accepts by default array as input. The process can also work on two values by setting the parameter `binary` to `true`. The reducer must compute a single or multiple return values of the same type as the input values were. Multiple values must be wrapped in an array. An example for a process returning a single value is ``median()``. In this case the specified dimension would be removed. If a callback such as ``extrema()`` returns multiple values, a new dimension with the specified name in `target_dimension` is created (see the description of the parameter for more information).\n\nA special case is that the reducer can be set to `null`, which is the default if no reducer is specified. It acts as a no-operation reducer so that the remaining value is treated like a reduction result and the dimension gets dropped. This only works on dimensions with a single dimension value left (e.g. after filtering for a single band), otherwise the process fails with a `TooManyDimensionValues` error.\n\nNominal values can be reduced too, but need to be mapped. For example date strings to numeric timestamps since 1970 etc.",
  "categories": [
    "cubes",
    "reducer"
  ],
  "parameter_order": [
    "data",
    "reducer",
    "dimension",
    "target_dimension",
    "binary"
  ],
  "parameters": {
    "data": {
      "description": "A data cube.",
      "schema": {
        "type": "object",
        "format": "raster-cube"
      },
      "required": true
    },
    "reducer": {
      "description": "A reducer to be applied on the specified dimension (see the process description for more details).",
      "schema": {
        "anyOf": [
          {
            "title": "Unary behaviour",
            "description": "Passes an array to the reducer.",
            "type": "object",
            "format": "callback",
            "parameters": {
              "data": {
                "description": "An array with elements of any data type.",
                "type": "array",
                "items": {
                  "description": "Any data type."
                }
              }
            }
          },
          {
            "title": "Binary behaviour",
            "description": "Passes two values to the reducer.",
            "type": "object",
            "format": "callback",
            "parameters": {
              "x": {
                "description": "The first value. Any data type could be passed."
              },
              "y": {
                "description": "The second value. Any data type could be passed."
              }
            }
          },
          {
            "title": "No operation behaviour",
            "description": "Specifying `null` works only on dimensions with a single dimension value left. In this case the remaining value is treated like a reduction result and the dimension gets dropped.",
            "type": "null"
          }
        ],
        "default": null
      }
    },
    "dimension": {
      "description": "The dimension over which to reduce.\n\n**Remarks:**\n\n* The default dimensions a data cube provides are described in the collection's metadata field `cube:dimensions`.\n* There could be multiple spatial dimensions such as `x`, `y` or `z`.\n* For multi-spectral imagery there is usually a separate dimension of type `bands` for the bands.",
      "schema": {
        "type": "string"
      },
      "required": true
    },
    "target_dimension": {
      "description": "The name of the target dimension. Only required if the reducer returns multiple values, otherwise ignored. By default creates a new dimension with the specified name and the type `other` (see ``add_dimension()``). If a dimension with the specified name exists, the dimension is replaced, but keeps the original type.",
      "schema": {
        "type": [
          "string",
          "null"
        ],
        "default": null
      }
    },
    "binary": {
      "description": "Specifies whether the process should pass two values to the reducer or a list of values (default).\n\nIf the process passes two values, the reducer must be both associative and commutative as the execution may be executed in parallel and therefore the the order of execution is arbitrary.\n\nThis parameter is especially useful for UDFs passed as reducers. Back-ends may still optimize and parallelize processes that work on list of values.\n\nThis parameter can't be used with the reducer set to `null`. If a reducer is specified but only a single value is available, the reducer doesn't get executed.",
      "schema": {
        "type": "boolean",
        "default": false
      }
    }
  },
  "returns": {
    "description": "A data cube with the newly computed values. The number of dimensions is reduced for callbacks returning a single value or doesn't change if the callback returns multiple values. The resolution and cardinality are the same as for the original data cube.",
    "schema": {
      "type": "object",
      "format": "raster-cube"
    }
  },
  "exceptions": {
    "TooManyDimensionValues": {
      "message": "The number of dimension values exceeds one, which requires a reducer."
    }
  },
  "links": [
    {
      "rel": "about",
      "href": "https://en.wikipedia.org/wiki/Reduction_Operator",
      "title": "Background information on reduction operators (binary reducers) by Wikipedia"
    }
  ]
}
soxofaan commented 4 years ago

we decided to bite the bullet and just fix it for all processes by using the spec files from openeo-processes: https://github.com/Open-EO/openeo-python-driver/pull/15

soxofaan commented 4 years ago

which @jdries merged while I was writing this comment.

still has to be deployed at the moment

m-mohr commented 4 years ago

I think that is a good idea.

Edit: Just realized you are not downloading the files from the repo, but refer to a specific commit. That should work. :)

soxofaan commented 4 years ago

just pushed the fix of Open-EO/openeo-python-driver#16