Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
48 stars 15 forks source link

Allow `udf` in `run_udf` to be array of strings #374

Open soxofaan opened 2 years ago

soxofaan commented 2 years ago

https://github.com/Open-EO/openeo-processes/blob/d0ce91fcd347360b907ea2d9589d7564a2c1e1e3/run_udf.json#L30-L51

In all practical use cases of run_udf I've seen, the udf code is provided as an inline string (not as URL or file path). Putting the UDF code as a single one-line string with newlines enced as \n is however horrible for version control (when the process graph is stored in a repository, which we often do).

Feature request: allow the udf code also to be an array of strings (one string per line of code), which is a lot friendlier for version control tools. (Note that this is the JSON encoding approach used by jupyter notebooks)

m-mohr commented 2 years ago

Hmm, I'm not sure about this. We also experiment with use cases (in R) where we may need to pass multiple UDF files (i.e. multiple file paths or uris) and then we run into ambiguities as an array of strings could be anything and we don't have a good pattern anymore to distinguish them.

That the file path is rarely used is IMHO not a good argument yet as user workspaces are simply not supported by back-ends yet and as such file-path can't be used although peoply may want to use it.

soxofaan commented 2 years ago

we may need to pass multiple UDF files (i.e. multiple file paths or uris) and then we run into ambiguities as an array of strings could be anything and we don't have a good pattern anymore to distinguish them.

Interesting. But if you want to pass multiple "files", you probably want to encode this as a mapping (filename -> contents) instead of just an array, because these files probably want to refer to each other (e.g. based on a filename).

That the file path is rarely used is IMHO not a good argument yet as user workspaces are simply not supported by back-ends yet and as such file-path can't be used although peoply may want to use it.

The problem with file path or URL is that you add a dependency to an external resource that might change unannounced, disappear, cause permission issues ... . Having your UDF fully embedded within your process graph is a good thing if you want to make it completely self-contained.

m-mohr commented 2 years ago

Interesting. But if you want to pass multiple "files", you probably want to encode this as a mapping (filename -> contents) instead of just an array, because these files probably want to refer to each other (e.g. based on a filename).

For file paths this is not required, you have the filename/path directly available and can load the content from it. For your use case, you'd indeed want to encode it in an object or an array of objects rather than in an array of strings, I assume.

It just gives you a bit more structure for complex UDFs instead of squashing everything into one file (and some languages are relatively strict what they allow in a file, think Java with one class per File or C++ with h and cpp files, if you think about the future).

The problem with file path or URL is that you add a dependency to an external resource that might change unannounced, disappear, cause permission issues ... . Having your UDF fully embedded within your process graph is a good thing if you want to make it completely self-contained.

Agreed for URLs, but I'd sincerly hope that this doesn't occur for my user workspace at the back-end. And I personally, would prefer to not have the UDF available as a file as for me it is easier to just up- and download files. But that's a matter of taste, indeed.

soxofaan commented 2 years ago

I'd like to raise the importance of this feature request for version control friendly inline UDFs (e.g. as list of strings)

I just introduced a bug in an internal project because I could not properly review the diff of the UDF code. Luckily the unit tests caught some side effect in my case, but in general this is pretty bad for the developer experience.