Open sinergise-anze opened 4 years ago
@m-mohr @christophfriedrich Any thoughts on this?
@sinergise-anze I've not forgotten your issue(s) and am grateful for having them. After my vacation I have a bit of a backlog of issues to work through. I'll get to these later for sure.
@m-mohr Thank you for your response, I understand completely... Later then. Enjoy your vacation!
Thanks for writing this down! First of all, I agree that it is useful to have some kind of process validation suite. There were several attempts already:
Your solution looks useful to me, I don't know what other back-ends do, maybe they have other solutions we should also invetsigate.
Some remarks regarding your solution:
So while I think we should work towards a test suite. It doesn't necessarily need to be in JSON and/or in the form of full process graphs. Ideas are very welcome.
I'll dig deeper into the topic and also try to come up with suggestions and maybe a full proposal. Any help is appreciated.
Thanks, inspiring ideas here!
At VITO we already have a bunch of unit/integration tests to check our process graph processing pipelines:
But it would indeed be great if this could be standardized better and reused more.
The idea of an assert_equals
process is interesting, but as @m-mohr already mentioned, it is not enough: you also need tools to build the expectation you want to compare your result with. For data cubes, there are currently not much official options except load_collection
which is not ideal for unit test contexts. Something like create_cube
from your example would be nice, but defining data cubes in JSON doesn't scale that well.
Like @m-mohr , I'm not sure that JSON is the ideal authoring level to define a lot of tests with potentially large data cube arrays. In the VITO tests I mentioned before we define the tests in Python. Sometimes using the python client library to build a process graph, sometimes without by directly constructing a JSON structure. Sometimes we use hard coded simple small arrays as input, sometimes we load data straight from geotiffs. But in any case, by working in Python we can more easily share/reuse/parameterize input data and output expectations without too much boilerplate and duplication. A possible middle ground is authoring test cases in Python (for example) and "compile" these test cases into JSON representation for distribution. That way you keep things practically maintainable and can still provide the test suite in a JSON-only, language agnostic way.
This is really an interesting discussion and I would also be in favor of a shared solution for this.
At EODC we recently started to write tests for the processes we support, but we do it one level "deeper", in the package where we map an openEO PG to the syntax needed internally. There, we actually use a test file as input and manually define a minimal PG including the process we want to test, so not a very general solution. Also, most of our processes our imported from https://github.com/Open-EO/openeo-processes-python/ which have their own tests. However, testing each process at the "top" level by submitting a minimal PG would be the best approach since it tests also the conversion from openEO syntax to the internal syntax of the back-end.
I think the main issue for a suite of tests (a list of PGs) that every backend can use to test processes, is to have a common input, since it's hard to ensure every backend has the exact same test collection. I like @sinergise-anze's approach to have a create_cube
(can we somewhat enforce that this process is used only for tests?) and then assert_equals
to check the output.
Anyone could then compile a JSON PG with their client of choice (or manually) and submit a PR to a tests repository.
Good thoughts!
@m-mohr:
- You are using create_cube, which seems to allow creating a data cube from an array. The process is not specified yet, the core process is create_raster_cube, but doesn't allow to pass anything yet.
Yes - we used create_cube
because we needed a process which would create a cube with values, dimensions and labels easily. We could probably use create_raster_cube
, but then it would be more work to assign values (probably with a combination of add_dimension
, apply
,...?).
- I could see that your approach is leading towards quite a lot of duplicated "code" (JSON), e.g. you always need to create a data cube with create_cube etc, which could be bothersome to do for 1000+ tests. We should maybe try to just define various initial data cubes and somehow re-use them?
Maybe also, yes - though for small cubes (such that are needed for "unit tests" for each of the processes) this is not such a problem.
- What exactly do you need the save result for in your result?
We don't actually need it per-se, it is just mandatory part of any process graph if we want to execute it through normal jobs pipeline. This could probably be changed, we just didn't find it important. We just ignore it.
- While defining the processes and examples for unit tests in JSON, I had issues with some things that can't be easily expressed with JSON. For example testing pi, cos, sin etc. as the results have infinite decimals. This is why those processes have a very limited amount of examples and also not very useful examples overall.
Maybe assert_equals
could take a parameter which would define "closeness"? Already, we are using xarray.testing.assert_allclose()
instead of xarray.testing.assert_equal()
in our implementation.
@soxofaan: I like the idea of generating the JSON from code! I guess this is orthogonal to this issue though? If JSON can be used to describe tests, it can be generated either manually or from the code.
I agree that the tests can become overwhelming if the cubes are not small, but fortunately:
@lforesta: Exactly! It would make correct interpretation of process definitions much easier for the developers implementing the backends. Once the process is defined, we could start by taking existing unit tests suites (Vito's, ours,...) and translating them to a common format.
@sinergise-anze
I guess this is orthogonal to this issue though? If JSON can be used to describe tests, it can be generated either manually or from the code.
Conceptually it is orthogonal indeed, but I think it is best to keep things together practically (e.g. in same repository). Generating JSON from code will be more maintainable in the long term than "manually" created JSON files. Also note that it is trivial to put "manual" JSON in Python code, so it would feasible to enforce that all cases should be in code.
Test parameterization in code has proven to be very useful for us. For example: say you want to test aggregate_spatial with different kinds of geometries (Polygon, MultiPolygon, Points, LineStrings, ...) and with different reducers (mean, median, max, ...). In Python+pytest we can do that with a single parameterized test function that will test all combinations of the matrix. With "manual" JSON you have to maintain tens of almost identical files.
FYI: I'm using Python and pytest as an example. I'm not implying that Python+pytest should be used for the final solution (although it would be my preference).
So I guess we have different approaches we can discuss:
POST /test
where you can run processes on small chunks of data. That could even be useful for users to check that processes do what they want without specifying a full workflow. So that's an improvement for debugging, I guess.For me 2B sounds most interesting and thus I'm elaborating it a bit more.
POST /test
in a way that you can specify a full data cube as input data (including dimensions, all dimension properties such as dimension labels and reference system and of course values). We could send something similar to cube:dimensions from data discovery + the actual data as multi-dimensional array. The process graph would somehow access the data directly ({from_parameter: ...}
? and also serialize the result node automatically), so no load_collection or save_result required. That would make the tests a bit less verbose. Please keep in mind not all processes are actually data cubes. So cube:dimensions
is likely optional. For processes such as sum
you only need a simple array for example. For processes such as gt
you may not even need an array, but only scalars.Opinions?
@m-mohr Very good ideas!
I think it is becoming clear that there would need to be way to declare any value that can be an input or output to any process (similar to our create_cube
, but more generic - it should cover all possible data types and all possible values).
The issue with options 2
is that both server and client need to be able to parse serialized data, and backend also needs to be able to output in this format, which means more implementation effort on both ends. Performing comparison on backend on the other hand is usually trivial as any datatype that is used internally (e.g. xarray.DataArray
in our case) of course comes with comparison methods.
But, I very much like the dedicated endpoint idea, and especially your last point - it means that save_result
hacks can be avoided. Also, I didn't mean to suggest that all pytest / jest / ... comparison methids should be added; I believe a single one (assert_equal
) should be enough for this usecase.
Maybe a combination of both ideas would work?
POST /test
endpoint200
or 4xx
(indicating input that is not valid according to this backend implementation).Note that these are not strong opinions, more preferences. Any solution is OK, the main end goal (for us) is clarifying the process definitions beyond just documentation.
@sinergise-anze That could also work if we can work around the floating point number issues. Comparing on the client side was mainly chosen so that we have a bit broader access to comparators, which include things like comparing floating point numbers based on an allowed difference specified by a delta (like the delta in the eq/neq openEO processes). If we can mitigate that somehow (e.g. specify a required precision for each result), server-side is also an option. What I also liked about client-side is that user could use that to actually debug a bit better. If they see the outcome of some operation they want to "test", it's often better understandable compared to just get error or success. Also, it makes the tests more transparent if anybody can try it.
My biggest question yet is the file format for trasmitting input and output. I just found that there's JSON5 at https://json5.org , but it seems that it has only a stable JS library so far and a very early Python library at https://pypi.org/project/json5/ . Other languages might be stuck. XML could be an option due to the widespread support. Then there's also things like YAML (expressing multi-dimensional arrays feels quite error-prone), Protobuf (not sure yet) and messagepack (not so friendly for humans). I'm very happy if somebody has a good idea here or has good arguments for a format...
Why is it necessary to have a separate /test
endpoint? If I understand correctly, /test
would be like /result
but with some additional features (specifying input arrays, no need for explicit save_result, ...). Is there anything of /result
that /test
would not support? If not: why not do everything with /result
?
What I also liked about client-side is that user could use that to actually debug a bit better.
indeed, not to be underestimated
My biggest question yet is the file format for trasmitting input and output.
I think we should at least support standard JSON for simple test cases (where there is no need to worry about decimals, null-vs-nan, ...) because it's easy and familiar to write. In that sense JSON5 looks interesting because it's a superset of standard JSON. But it's indeed not ideal that it's not widely supported yet.
I a had a quick scan through https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats for some more inspiration, but could not find something that stands out as the ideal solution.
@soxofaan The reasons for the separate endpoint are the following:
I think we should at least support standard JSON for simple test cases (where there is no need to worry about decimals, null-vs-nan, ...) because it's easy and familiar to write. In that sense JSON5 looks interesting because it's a superset of standard JSON. But it's indeed not ideal that it's not widely supported yet.
Unfortunately, you run relatively easily into these use cases where JSON is not enough. ~guess with JSON5 (JS & Python) we would cover all back-ends except the WCPS part of EURAC, right?~ @aljacob -> outdated, see ION below
I a had a quick scan through https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats for some more inspiration, but could not find something that stands out as the ideal solution.
Great resoource, thanks! What about ION? http://amzn.github.io/ion-docs/ That looks very similar to JSON5, but has broader support in languages! It has its own media type (application/ion+json) so we could accept both JSON and ION through content negotiation.
- It avoids adding new processes and file formats for testing purposes only. Mixing processes and file formats for testing only with the "normal" stuff may confuse users.
I'm not sure what you mean here. Processes and formats are defined orthogonally to endpoints, right? Moreover, why would you want to shield users from testing? Isn't it added value for normal users to have tools that help with verifying reproducability?
- I'd think testing should not be billed, but that is the default for /results
Shouldn't billing be orthogonal to endpoints too? You don't want a backdoor to run real queries for free on /test
:smile:
- I think you can rather likely restrict resources for an endpoint
That's true, it's easier to bootstrap resource usage estimations based on endpoint. But estimating it from the input data source shouldn't be a lot harder (is it a real load_collection call or a small "create_cube" input array?)
- It just separates things a bit more nicely
I'm not sure that it should be separated that strongly (also see my comment on 1). Also: you can also separate processes through namespaces.
What about ION?
I somehow missed that one, never heard of it. Looks interesting indeed.
In a way /result
is also meant for testing, because no one will / should process large datasets with sync processing. We would need add extra Nullable
fields in POST /result
. But I don't dislike the idea of a separate endpoint though, mostly because it has a very different type of input (passing data directly).
In any case, I very much like the idea of having this functionality; this would be great for users too.
@m-mohr @soxofaan
Giving it some more thought, I have changed my mind a bit - I am more and more convinced that the original suggestion (with some improvements) is better because of its simplicity. I believe it is a) easier to implement on backend side, b) more versatile, c) easier to use and d) results in more understandable tests. I will try to write more about why I think this is so, but the TL;DR version is: if I want to test my process graph, I can simply add a few processes and copy it into editor. It doesn't get any simpler than this... :)
a) easier to implement on backend side (and no client side at all)
Suggested solution means creating two relatively simple processes (one to create synthetic data, one to compare data for equality). This is all that is needed.
The first process somewhat matches the parsing of input data (ION?), but I would argue that using a JSON solution would be easier to use. Instead of create_cube
, a process which would create any data type could be introduced (create_data
?). This is an implementation detail so I didn't include it here.
The main simplification comes from the fact that the backend doesn't need to output serialized data or implement another endpoint. And there is no client-side needed at all.
I would also argue that from the assert family, only assert_equal
is needed (with optional delta parameter). If user needs more advanced capabilities they can download the TIFFs and inspect the result with jest / pytest / ... - but for the purpose of process unit tests validation suite, only equality comparison is needed.
If parametrized tests are needed, it should be trivial to change them so that they execute process graphs against a backend.
b) more versatile
The beauty of using process graphs is that if a user has some problem with a process graph (e.g. they are getting unexpected results), they can substitute load_collection
with create_data
and create an integration test out of it, making sure that their process graph (and backend implementation) is correct.
Bonus: such examples can then be sent to backends as complete bug reports.
c) easier to use
The whole process can be done within the OpenEO WebEditor (or using a curl), no special tools need to be installed.
Of course special debugging tools can be used on client side if so inclined; they are just not mandatory. User can still download the TIFFs and simply inspect results (even partial ones) for equality.
d) results in more understandable tests
Not everyone is familiar with Python / ION / any other chosen technology. Using the existing framework would mean that anyone who can read process graphs can also understand the tests.
In other words, I would suggest doing this:
EDIT: reading this again - I hope the tone is right. I don't want to impose on the vision, as I was not involved in this project as much as any of you. So please take this as a constructive suggestion that it is. :)
@soxofaan
I'm not sure what you mean here. Processes and formats are defined orthogonally to endpoints, right? Moreover, why would you want to shield users from testing? Isn't it added value for normal users to have tools that help with verifying reproducability?
I don't want to shield users from testing, I actually want the opposite. I meant having for example a file format "JSON" or a process create_cube that is meant for testing only may confuse users as they think it's something they can use differently for "normal" process chains. I'm not sure that "fear" is actually reasonable or it's not a problem at all.
Shouldn't billing be orthogonal to endpoints too? You don't want a backdoor to run real queries for free on
/test
š
Of course not, therefore it would be really limited to the use case. I must admit I wouldn't really want to pay for requests that are so small and just for debugging. It may need a separate billing plan for testing, which could be counter-intuitive.
I'm not sure that it should be separated that strongly (also see my comment on 1). Also: you can also separate processes through namespaces.
Yes, in the PG, but not yet in process discovery. So that doesn't help a lot at the moment.
@lforesta
In a way
/result
is also meant for testing, because no one will / should process large datasets with sync processing.
Yes, but there's still a difference between
We would need add extra
Nullable
fields in POST/result
.
I don't understand that. What fields are you thinking about?
Splitting my answer in two parts, will answer @sinergise-anze in the second post
@sinergise-anze
When thinking about a process validation suite, I'd excpect that I can test any process independantly, so for example simply min() or is_nan(), which has no relation at all to data cubes. I don't want to wrap thise processes into cube processes like reduce_dimension to test them. I hope we have the same understanding here. Some unordered remarks in response to your post:
In addition to create_cube, you'd also need a function to create normal arrays as proposed in #202 (array_create
). Otherwise you can only test via data cubes, but that is bothersome. Maybe you simply want to test the mean function. Also, you need a function to create labeled arrays.
Regarding a JSON-only solution: How would you test the is_nan or is_infinite process for example if you can't pass +/-inf and nan through JSON? Of course you can check is_nan(nan())
, but that is not what I would expect from a full test suite. There I'd also expect the "native" value to be checked. While I dounds like we don't need to use ION (for example) in the /test endpoint in your proposal, the problem still exists when passing JSON in /result and thus you'd also need to accept ION (or whatever else) in /result.
Generating a GeoTiff for tests to check data on the client side sounds bothersome to me. At least I have no tooling except QGIS directly at hand to check the content of the file. As a user, I think I would prefer to just get a "human-readable" representation of the result to see what is going on at the back-end.
How would back-end handle the case if my test does not contain a save_result and load_collection? At the moment it seems no back-end is prepared for this, so is it really so much easier to implement compared to a new endpoint? Not saying one or the other is better, but I think in general it could be some work for others. If we think about a solution now, I'd not like to require workarounds like adding an unnecessary processes like save_result to the tests. Also, I can't pass the result of a non-cube operation to save_result, so the GeoTiff solution fails at the moment for non-cube operations.
Regarding parametrized tests I don't understand the solution yet. Is it using process parameters?
It is indeed correct that client support is currently better for your approach but...
The whole process can be done within the OpenEO WebEditor (or using a curl), no special tools need to be installed.
... I'd very likely integrate any new solution in the Web Editor, too. ;-)
Bonus: such examples can then be sent to backends as complete bug reports.
Isn't that possible with all solutions discussed so far?
Not everyone is familiar with Python / ION / any other chosen technology. Using the existing framework would mean that anyone who can read process graphs can also understand the tests.
We are not bound to Python with a new endpoint, it can be any language or the Web Editor. (Of course our core set of tests would likely be in a single language, but I'm pretty sure also the generation of the tests used in your proposal would be generated from a language of choice, right?) Additionally, ION (and other implementation details) would/should likely be hidden anyway. Actually, reading process graphs is nothing we'd expect users to do. We actually try to hide them. We just expect them to use any of our tools (libraries, Web Editor, ...).
EDIT: reading this again - I hope the tone is right. [...] So please take this as a constructive suggestion that it is. :)
For me the tone is right š, I hope mine is fine as well. It's all meant as a constructive discussion.
How would back-end handle the case if my test does not contain a save_result and load_collection? At the moment it seems no back-end is prepared for this, so is it really so much easier to implement compared to a new endpoint?
FYI: the VITO backend already supports process graphs without load_collection or save_result.
For example: add 2 numbers (calculator as a service FTW :smile: )
curl -X POST -H "Authorization: Bearer basic//..." -H "Content-Type: application/json" \
-d '{"process": {"process_graph": {
"add": {"process_id": "add", "arguments": {"x": 3, "y": 5}, "result": true}
}}}' \
https://openeo.vito.be/openeo/1.0/result
8
example using array_element and an input array through the constant
process:
curl -X POST -H "Authorization: Bearer basic//..." -H "Content-Type: application/json" \
-d '{"process": {"process_graph": {
"c": {"process_id": "constant", "arguments": {"x": [2, 3, 5, 8]}},
"ae": {"process_id": "array_element", "arguments": {"data": {"from_node": "c"}, "index": 2}, "result": true}
}}}' \
https://openeo.vito.be/openeo/1.0/result
5
(not clear from these examples, but result is returned as JSON)
@m-mohr Thank you for your in-depth response! The idea I had was to use create_data
process to create any variable of any possible type... However, when writing my response, I figured out that I am not doing a good job at explaining what I had in mind. To fix this, I have created a small repository as a proof of concept: https://github.com/sinergise-anze/openeo-process-validation-suite-poc . For two chosen processes I have created some of the unit tests. I hope the tests are easy to understand - if they are not, this approach fails anyway. :)
While writing the tests, I noticed that my idea was actually not good enough. Having a process output data is cumbersome and it is near impossible to use for e.g. band
variables within the definition of another variable (like raster-datacube
). This is the reason that in this PoC I took a different approach - all variables that do not have native support in JSON are defined using "object notation" (similar to { "from_node": ... }
, { "from_parameter": ... }
,...). Infinity and similar can also be defined.
Additional added value is that now the tests are even shorter (and thus easier to read) than they were before. Also, this means that data can be fed directly into any process, regardless of the datatype, so this is quite a powerful addition. But the cons is that another "object notation object" needs to be defined. And of course, defining all possible datatypes is a whole mini project by itself...
Note that I understand this is just one of possible solutions and that it is a trade-off.
Thank you, @sinergise-anze. That is quite a bit to look through, I'll get back to that later, maybe even with an alternative solution based on your POC. I would like to avoid that we need to invent a new "language" and define too much our self. That sounds like a lot of implementation effort. Either we should aim for a simple extension of the process graphs or something than can be easily generated with existing tools. Your "type" objects looks similar to something I've seen when looking for a file format (and found ION). I need to find out which it was, maybe we can just adopt that. Also, I think we should separate data "creation" from the process so that we can re-use the data and don't need to put the whole data creation process into each of the files. We could pass them differently via some kind of references (e.g. from_parameters or file paths).
the VITO backend already supports process graphs without load_collection or save_result.
@soxofaan Great! I'm sure I heard others can't handle that yet (e.g. GEE and at least one more).
FYI: the VITO backend already supports process graphs without load_collection or save_result.
I think the issue is, we never specified what the outcome of a sync call without save_result is. So you are using JSON, which makes sense, but is likely not implemented broadly yet. Also, what would the format of the JSON be like? For non-datacubes it seems pretty straightforward though.
We created Pydantic Classes from the OpenEO API JSON process graph specification and tested them against the standard processes (from https://raw.githubusercontent.com/Open-EO/openeo-processes/1.x.0/)
We found that ['array_apply', 'array_filter'] use
schema": [ {"type": "number"}, . . . {"type": "null"}
instead of
"schema": { "type": ["number", "string", "null"]}
So,
Running the tests against other lists of standard processes reveals other inconsistencies. Not sure how to proceed here in a way that benefits all. PS. the reason we used Pydantic is that FastAPI gives us automatic checking of the API parameters)
Hey @danielFlemstrom,
I've moved your issue over to #318 so that we can clearly separate the issues of having a process validation suite (this issue) vs. specific issues that we find (most of your issue). Thank you for your contribution.
See #468 and https://github.com/Open-EO/openeo-test-suite
Note: I have added this issue here instead of creating an
assert_equals
process proposal because I think a more general agreement should be reached before adding process definitions. I would be happy to add PRs for the processes later if this is what we agree on.Problem statement:
While implementing processes in our backend and especially when converting them from v0.4 to v1.0, we have discovered that it is in many cases difficult to understand what needs to be developed because of:
Because of this, we are not 100% sure that the processes we have written (even the ones we wrote from scratch, lately, and covered with the unit tests) really conform to the standard. Or even that other backends' implementations match ours.
We see this as a problem because:
Proposed solution:
Our goal in this issue is primarily to open a discussion on this, however, we do also have a proposal of a mechanism that would help solve this problem:
assert_equals
process definition in the core standard andassert_*
processes to validate backend's implementation of a process (such test process graphs could be (logically) included into the backend-validator efforts, since they try to address the same issue, at a different level of granularity).Short description of
assert_equals
: Process takes inputa
andb
. If they are the same (type, attributes, values,...), it does nothing. If they differ, it throws an exception, describing the difference.We already use a similar approach for our integration tests. For example, this is a process graph which tests
linear_scale_range
process:Of course, this is not a comprehensive example - many more test process graphs should be added. In our case we didn't need it because we cover processes with separate unit tests in Python, but those can't be used with other backends and as such are not helpful for ensuring all the backends' conformance to the standard.
Benefits for the users:
We are aware of the limited utility this gives to the end user, though there is still some benefit in adding this:
load_collection
and addassert_equals
for this though)Note that this is just a suggestion. There might be other, better ways of solving the same problem - we just think that this is a problem worth solving. Also,
assert_equals
is already implemented in our backend as we found it useful even for our purposes only, though it could probably be improved (currently it only works with raster datacubes).We would appreciate some thoughts on this.