aiidateam / aiida-testing

A pytest plugin to simplify testing of AiiDA plugins.
MIT License
5 stars 6 forks source link

Add support for testing fail conditions of code (inside complex workflows) #42

Open DanielMarchand opened 3 years ago

DanielMarchand commented 3 years ago

It is a bit painful right now to test for code that fails in unusual ways. Current caching assumes a kind of one-to-one mapping between inputs and outputs. What we would like is to test multiple fail conditions, e.g., due to node failure during computation, but all share the same input parameters. The only way that this can be done now is to manually copy cached files and modify neutral parameters, e.g., title in pw.x. @greschd @ltalirz

greschd commented 3 years ago

Some additional thoughts from our discussion:

There are essentially two use cases for when we would like to test "failure modes" of calculations: The first is in calculation plugins, to see that the parser handles / detects the errors correctly. For this, the approach currently taken in aiida-quantumespresso seems reasonable: The "outputs" (of the code) and are explicitly created by the developer, and passed to the parser.

The more difficult use case that we are trying to solve here is how to induce a failure deep inside a workflow. One way we thought up of doing this is to allow a "hook" of some kind that can run after the calculation node (and its outputs) are created, that is allowed to modify the calculation attributes or its outputs.

Because this hook would have access to the provenance of the calculation to be modified, it could implement somewhat complex logic to decide if the output should be "broken".

Implementing this behavior would almost surely need to be tied to some AiiDA internals, but I think that is an acceptable risk.

An alternative we discussed is intentionally creating a "broken" calculation ahead of time, and then using the export_cache mechanism to force the workflow to use that. The problem with that is that it could accidentally not use the cached version, and it may be cumbersome to regenerate the "failed" calculation if something about its inputs changes.

ltalirz commented 3 years ago

A list with examples of failure modes you want to simulate and what exactly you would like to test for those would be helpful in order to make the suggested solutions more concrete.

E.g. depending on what you need, you could simply set up an additional "fake mock executable" for your failure tests that is a bash script and fails in the "unusual way" of your choice.

If it is too much of an effort to conjure up an unusual failure from scratch, and you prefer to instead modify the correct result from the executable, then Dominiks suggestion of a post-calculation hook sounds reasonable. Since provenance isn't really a concern in this case, perhaps one could monkey-patch the parser to modify the calculation node via some low-level internal AiiDA API (or inject a first parser that does that before the second, regular parser kicks in).

In any case, having a clear list of use cases would help narrow down the possible solutions.

greschd commented 3 years ago

@ltalirz just to clarify (sorry this wasn't clearer from the start): @DanielMarchand and myself have been discussing this directly - our current use case is that we add sanity-checks on the aiida-quantumespresso calculations done within aiida-tbextraction (see https://github.com/greschd/aiida-tbextraction/issues/16), and want to add tests for that. This is the "deep inside a workflow" scenario described above.

greschd commented 3 years ago

E.g. depending on what you need, you could simply set up an additional "fake mock executable" for your failure tests that is a bash script and fails in the "unusual way" of your choice.

Yeah that's definitely also an option - although I feel it could become quite complex; the main drawback I see is that it'd be hard to distinguish exactly where the error should be injected (again, in the complex workflow example), since presumably the same code may be called multiple times.

I'm thinking this could eventually be another "top-level" feature of aiida-testing, but bringing the ones we already have to a releasable state is higher priority. Just noting it down here s.t. we don't forget about it, and to get the discussion going.