Open-EO / openeo-r-client

R client package for working with openEO backends
https://open-eo.github.io/openeo-r-client
Apache License 2.0
61 stars 17 forks source link

Give users the full process instead of just the graph #54

Closed m-mohr closed 2 years ago

m-mohr commented 3 years ago

In general, but especially also in the QGIS plugin and the Web Editor, openEO processes in JSON need to be fully defined including their potential metadata (see example 1), i.e. it is not enough to just list the graph nodes (see example 2). Unfortunately, the R client and Python client export the graphs without their metadata. This leads reportedly to some confusion when users try to copy/paste the "incomplete" processes into the Web Editor, QGIS Plugin or the Hub for example, which then can't handle them properly. Therefore, I'd really like to see that the R and Python clients print out the full process (see example 1).

Example 1:

{
  "process_graph": {
    "1": {
      "process_id": "add",
      "arguments": {
        "x": 1,
        "y": 2
      },
      "result": true
    }
  }
}

Example 2:

{
  "1": {
    "process_id": "add",
    "arguments": {
      "x": 1,
      "y": 2
    },
    "result": true
  }
}
flahn commented 3 years ago

I think, I had implemented something like that. Just use create_user_process and set submit=FALSE. The resulting object is a ProcessInfo class which has the same representation as an openEO process.

m-mohr commented 3 years ago

Thank you, I'll look into it tomorrow, I just saw again that someone using the R client exported it "incorrectly" again. Wasn't there a graph or toJson function or something like that that may still export the old version? Also, this is nothing I expect you to fix soon (I can't expect anything anyway ;-) ), but is more a reminder for later...

m-mohr commented 3 years ago

Okay, the issue is the following code used in many examples (e.g. GEE + EURAC):

...

final = p$save_result(data = apply_linear_transform,format = formats$output$PNG)
graph = as(final,"Graph")
graph

The graph outputted is just the "graph" (example 2), not a complete process (example 1).

Such behavior should either be changed or not be promoted in examples.

flahn commented 3 years ago

Yes, it is as I imagined, you wan't to have an openEO process (JSON) from this graph. But as the name difference suggests you have to create a process by adding metadata to the graph using create_user_process first. When you set the parameter submit=FALSE you won't do the HTTP call, but an object (a process) is returned instead. This object is list based, but the class will state that it is something process related (don't know the exact class name, right now). If you print this object, it will get a nice visualization like every other openEO process or user-defined process. To get it as JSON you would need to call jsonlite::toJSON and set force=TRUE to treat the object as a simple list which it is. Additional parameter like auto_unbox and pretty might be set as well.

flahn commented 3 years ago

@m-mohr Can you point me to the examples where it is promoted, that you can 1:1 copy the graph into the Webeditor? Then I can change this.

m-mohr commented 3 years ago

I found it here, I'm not sure whether I found them all:

Would it make sense to implement as(..., "Process") instead of as(..., "Graph") or is create_user_process more reasonable?

m-mohr commented 3 years ago

Actually, as(final,"Graph") was used again in the review meeting although it should be discouraged. I'd actually propose to completely remove as(final,"Graph") and replace it with something to just print the full process or let as(final,"Graph") print the full process. Just the process graph without metadata will not be useful in the openEO world and will confuse users. @MilutinMM

MilutinMM commented 3 years ago

I see the point, and as(graph_test1,"Graph") is now removed from the script. Instead, I add the following code:

# create a bach job from the graph:
job_id = create_job(con = eurac, graph = graph_test1, title = "job_final_review_demo", description = "job_final_review_demo") # batch call, works on udf!
# get the JSON text of the process graph:
graph_info = create_user_process( graph_test1,id=job_id$id, submit = FALSE, con = eurac )
graph_json = jsonlite::toJSON(graph_info$process_graph)
print(graph_json)

and this is the result:

{"load_collection_ZHZZM4626F":{"process_id":["load_collection"],"arguments":{"id":["openEO_WUR_Usecase"],"spatial_extent":{"west":[-54.815],"south":[-3.515],"east":[-54.81],"north":[-3.51]},"temporal_extent":[["2017-01-01T00:00:00Z"],["2019-12-29T00:00:00Z"]],"bands":[["VH"]]}},"run_udf_ZIAXO9027Y":{"process_id":["run_udf"],"arguments":{"data":{"from_node":["load_collection_ZHZZM4626F"]},"udf":["# @require x:stars\nlibrary(bfast)\nlibrary(abind)\n# set_fast_options()\n# this works because band is a empty dimension\nx = adrop(x)\n\n# Define the pixel-wise function\nSpatialBFM = function(pixels)\n{\n  lsts = ts(pixels, c(2017, 1), frequency=30.666667)\n  bfastmonitor(lsts, 2019, formula=response~trend)$breakpoint\n}\n# either use adrop() to drop the band dimension... or include here to reduce.\n#StarsResult = st_apply(x, c(\"x\", \"y\", \"band\"), SpatialBFM, PROGRESS=TRUE)\nStarsResult = st_apply(x, c(\"x\", \"y\"), SpatialBFM, PROGRESS=TRUE)\n# deal with NA-a:\nStarsResult[is.na(StarsResult)] = -9999\n#\nStarsResult"],"runtime":["R"],"context":{}}},"save_result_VWHJC2286A":{"process_id":["save_result"],"arguments":{"data":{"from_node":["run_udf_ZIAXO9027Y"]},"format":["GTiff"],"options":{}},"result":[true]}}

then I used additional step to visualize the json in a web editor to get the following:

{
  "load_collection_ZHZZM4626F": {
    "process_id": [
      "load_collection"
    ],
    "arguments": {
      "id": [
        "openEO_WUR_Usecase"
      ],
      "spatial_extent": {
        "west": [
          -54.815
        ],
        "south": [
          -3.515
        ],
        "east": [
          -54.81
        ],
        "north": [
          -3.51
        ]
      },
      "temporal_extent": [
        [
          "2017-01-01T00:00:00Z"
        ],
        [
          "2019-12-29T00:00:00Z"
        ]
      ],
      "bands": [
        [
          "VH"
        ]
      ]
    }
  },
  "run_udf_ZIAXO9027Y": {
    "process_id": [
      "run_udf"
    ],
    "arguments": {
      "data": {
        "from_node": [
          "load_collection_ZHZZM4626F"
        ]
      },
      "udf": [
        "# @require x:stars\nlibrary(bfast)\nlibrary(abind)\n# set_fast_options()\n# this works because band is a empty dimension\nx = adrop(x)\n\n# Define the pixel-wise function\nSpatialBFM = function(pixels)\n{\n  lsts = ts(pixels, c(2017, 1), frequency=30.666667)\n  bfastmonitor(lsts, 2019, formula=response~trend)$breakpoint\n}\n# either use adrop() to drop the band dimension... or include here to reduce.\n#StarsResult = st_apply(x, c(\"x\", \"y\", \"band\"), SpatialBFM, PROGRESS=TRUE)\nStarsResult = st_apply(x, c(\"x\", \"y\"), SpatialBFM, PROGRESS=TRUE)\n# deal with NA-a:\nStarsResult[is.na(StarsResult)] = -9999\n#\nStarsResult"
      ],
      "runtime": [
        "R"
      ],
      "context": {

      }
    }
  },
  "save_result_VWHJC2286A": {
    "process_id": [
      "save_result"
    ],
    "arguments": {
      "data": {
        "from_node": [
          "run_udf_ZIAXO9027Y"
        ]
      },
      "format": [
        "GTiff"
      ],
      "options": {

      }
    },
    "result": [
      true
    ]
  }
}
MilutinMM commented 3 years ago

As my main purpose was to present how a graph could look like, I find print(job_id) even better and with more relevant detail. The question is if the graph in the print(job_id) result below is ready to be copy-pasted to the web editor and QGIS:

Job ID:     949b1db4-5fad-4e8b-8ba1-aecb7b716bd2
Title:      job_final_review_demo
Description:    job_final_review_demo
Status:     created
Created:    2020-12-04T13:27:38.161Z
Updated:    2020-12-04T13:27:38.161Z
Progress:   0
Plan:       free
Costs:      
Budget:     ---
User defined process:
Process:    
Summary:    
Description:    
Returns:    

Stored process graph:
{
  "run_udf_ZIAXO9027Y": {
    "process_id": "run_udf",
    "arguments": {
      "data": {
        "from_node": "load_collection_ZHZZM4626F"
      },
      "udf": "# @require x:stars\nlibrary(bfast)\nlibrary(abind)\n# set_fast_options()\n# this works because band is a empty dimension\nx = adrop(x)\n\n# Define the pixel-wise function\nSpatialBFM = function(pixels)\n{\n  lsts = ts(pixels, c(2017, 1), frequency=30.666667)\n  bfastmonitor(lsts, 2019, formula=response~trend)$breakpoint\n}\n# either use adrop() to drop the band dimension... or include here to reduce.\n#StarsResult = st_apply(x, c(\"x\", \"y\", \"band\"), SpatialBFM, PROGRESS=TRUE)\nStarsResult = st_apply(x, c(\"x\", \"y\"), SpatialBFM, PROGRESS=TRUE)\n# deal with NA-a:\nStarsResult[is.na(StarsResult)] = -9999\n#\nStarsResult",
      "context": {},
      "runtime": "R"
    }
  },
  "save_result_VWHJC2286A": {
    "result": true,
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "run_udf_ZIAXO9027Y"
      },
      "format": "GTiff",
      "options": {}
    }
  },
  "load_collection_ZHZZM4626F": {
    "process_id": "load_collection",
    "arguments": {
      "temporal_extent": [
        "2017-01-01T00:00:00Z",
        "2019-12-29T00:00:00Z"
      ],
      "spatial_extent": {
        "east": -54.81,
        "south": -3.515,
        "north": -3.51,
        "west": -54.815
      },
      "id": "openEO_WUR_Usecase",
      "bands": [
        "VH"
      ]
    }
  }
} 
m-mohr commented 3 years ago

No, print() is also giving the imcomplete process (can't be visualized in the Web Editor for example), so yet another place that should be fixed.

przell commented 3 years ago

@m-mohr, @flahn: I was just talking to MeterGroup and @tim-salabim. They ran into the same problem. They used as(xxx, "Graph") and tried to visualize in the webeditor.

m-mohr commented 3 years ago

I'm working towards having the process visualizations standalone as a component that it can be integrated like the Collections component for example, but it's not an easy task as it has quite a lot of dependencies so it will take a bit more time...

flahn commented 3 years ago

I'm coming a bit late to the discussion, sorry. "Graph" and "Process" (user defined or predefined) are different concepts. The graph as such is just the part of the process that tells the back-end what it has to do. Process is more, it contains also meta data about the graph like name, description, etc. To add the meta data to the graph you have to create a process, hence create_user_process. Usually that process is not really of interest for the user, because it should go straight to the back-end. But it seems that there is a bigger use case for that and I see what I can do. The best option would be indeed the webeditor components embedded in the RStudio viewer.

przell commented 3 years ago

How is the python client handling this at the moment? @m-mohr @clausmichele

clausmichele commented 3 years ago

I've opened an issue a couple of days ago exactly about this for the python client https://github.com/Open-EO/openeo-python-client/issues/209

m-mohr commented 3 years ago

Both R client and Python client still work a bit like in API v0.4 and it seems both never fully adopted the new process schema internally. In 0.4 there was only process graphs without metadata, but a process graph without any metadata and without the wrapper is pretty much useless in API 1.0, so the clients should adopt that structure.

przell commented 3 years ago

Just an update on this topic to pick up when we continue. On the create_user_process version there are square brackets added where there shouldn't be any.

Process graph definition:

data = p$load_collection(id = collection, 
                         spatial_extent = bbox,
                         temporal_extent = time_range, 
                         bands = bands)
result = p$save_result(data = data, format = "NetCDF")

as() ouptut:

process_graph = as(result, "Graph")
{
  "load_collection_GXORC7003Z": {
    "process_id": "load_collection",
    "arguments": {
      "id": "SENTINEL2_L1C_SENTINELHUB",
      "spatial_extent": {
        "west": 13.0974,
        "east": 13.1003,
        "south": 47.8242,
        "north": 47.8257
      },
      "temporal_extent": [
        "2018-01-01T00:00:00.000Z",
        "2018-12-31T00:00:00.000Z"
      ],
      "bands": [
        "B04"
      ]
    }
  },
  "save_result_SPVBN1400V": {
    "process_id": "save_result",
    "arguments": {
      "data": {
        "from_node": "load_collection_GXORC7003Z"
      },
      "format": "NetCDF",
      "options": {}
    },
    "result": true
  }
} 

create_user_process() output:

graph_info = create_user_process(result,
                                 id = result$id,
                                 submit = FALSE)

print(jsonlite::toJSON(graph_info, pretty = TRUE))
{
  "id": {},
  "process_graph": {
    "load_collection_GXORC7003Z": {
      "process_id": ["load_collection"],
      "arguments": {
        "id": ["SENTINEL2_L1C_SENTINELHUB"],
        "spatial_extent": {
          "west": [13.0974],
          "east": [13.1003],
          "south": [47.8242],
          "north": [47.8257]
        },
        "temporal_extent": [
          ["2018-01-01T00:00:00.000Z"],
          ["2018-12-31T00:00:00.000Z"]
        ],
        "bands": [
          ["B04"]
        ]
      }
    },
    "save_result_SPVBN1400V": {
      "process_id": ["save_result"],
      "arguments": {
        "data": {
          "from_node": ["load_collection_GXORC7003Z"]
        },
        "format": ["NetCDF"],
        "options": {}
      },
      "result": [true]
    }
  },
  "parameters": [],
  "returns": {
    "schema": {
      "type": ["boolean"],
      "subtype": [],
      "pattern": [],
      "parameters": [],
      "items": {
        "type": {}
      },
      "minItems": [],
      "maxItems": []
    }
  }
} 
flahn commented 3 years ago

@przell, you can use the parameter auto_unbox=TRUE to get rid of the brackets, when using toJSON

flahn commented 3 years ago

As for now, I have implemented the as(x,"Process") function to create a simple user defined process representation. The returned object is of type "Process", but the underlying structure of predefined and user defined process are the same. You can print that object now to print the JSON object.

As a deprecation note: now that we can obtained a simple UDP on the client by coercion, we don't necessarily need the parameter submit in create_user_process any longer. It will still work, but I have added a note, when using the function to obtain a Process object.

m-mohr commented 3 years ago

Great! Could we also deprecate (and print a warning?) for as(x,"Graph")?

flahn commented 3 years ago

I advise against it, because i use this internally to coerce into the process graph part of a UDP. I cannot differentiate if the user called the function explicitly or if it was called from an internal function, which would be annoying to read. However, I removed examples in the code documentation which used as(x,"Graph") and pointed out that this might be revoked from the function export of the package (087c9f3).

m-mohr commented 3 years ago

Okay, then a warning is not a good idea, but you could still deprecate it in the docs, right?

flahn commented 3 years ago

Kind of. I haven't found any option to denote deprecation in the documentation. That is, what I did https://github.com/Open-EO/openeo-r-client/blob/087c9f3a907c4d8714ee3594d219e52c72c1f94f/man/as.Graph.Rd#L29

flahn commented 2 years ago

I removed printing the process graph in UDPs and services. Instead I advertise / encourage the user to coerce it into a Process. Printing the process will then of course show a valid process graph. I hope that I have eliminated now all potential misuses. If not feel free to open this issue again or create a new one ;)