Open-EO / openeo-api

The openEO API specification
http://api.openeo.org
Apache License 2.0
91 stars 11 forks source link

Clarification of batch job error #436

Open soxofaan opened 2 years ago

soxofaan commented 2 years ago

In aggregator's large area processing feature I'm currently assuming a batch job error is not recoverable (so once status is "error" , it will never change anymore).

Or am I misinterpreting? For example: can a batch job be restarted (e.g. go back to status "queued"/"running") after it reached status "error"? Or can/must it be canceled first to do that?

soxofaan commented 2 years ago

(I guess we should have a batch job status transition diagram somewhere in the docs)

m-mohr commented 2 years ago

This is mostly up to the back-end to decide, but in principle, a batch job with status error, finished or canceled can be started/queued again. What happens exactly is up to the implementing back-end, e.g. for an error, it could continue at the error or start from scratch again. For a finished job, it could discard all results and start from scratch again or it could just add another result and then serve both for download. This is intentionally left open as we don't know how back-ends work internally. I I'd need to come up with a best practice I would say that restarting from scratch should be possible from the three status as continuing after an error seems pretty much undefined and/or doesn't make sense with a change in the process graph.

There is some kind of a written "status transition diagram" at https://api.openeo.org/draft/index.html#operation/start-job (with additions on other endpoints, e.g. delete or create), but due to the reasons listed above it's not complete.

An error state doesn't need to be canceled:

This endpoint [for queueing] has no effect if the job status is already queued or running.

Which means it (queueing) has an effect in any other state.

I'll try to come up with a diagram though.