ExaWorks / job-api-spec

https://exaworks.org/job-api-spec/
3 stars 3 forks source link

Discuss handling of multiple errors #162

Open hategan opened 1 year ago

hategan commented 1 year ago

It is possible for multiple errors to occur during a job's lifetime. For example, a job can fail due to a LRM issue. This could trigger a stage-out where errors in file transfers can also occur. We need to clarify what implementations are supposed to do in such situations. Some possibilities are:

andre-merzky commented 1 year ago

Good topic. I want to add my biased opinion here (have used / implemented all schemes at some point or the other).

Composite errors put additional load on the programmer: he likely has to deal with custom, composite exception types which require additional, non-standard code to handle, and he needs to decide what error is more significant, actionable, etc.

Last error is mostly useless IMHO: your example shows that perfectly - the staging error is nothing the user can meaningfully respond to.

First error is best IMHO, and is also what most libraries I know do. For everything else (follow-up errors) we have logfiles.