Distinguish permanent API errors from transient ones

We do at present not distinguish "not found" errors (permanent) from e.g. "the Kubernetes API server temporary can not be reached" (transient). Because of this, a Stage's verification process may fail prematurely while the controller could theoretically automatically recover it, if given the time.

As manually recovering from it is both cumbersome to a user, and potentially a waste of computing power used by the AnalysisRun. I think we can do a better job at distinguishing these type of errors, and prevent giving up on transient ones by e.g. requeueing and not erasing AnalysisRun references, etc.

xref: https://github.com/akuity/kargo/pull/1611#discussion_r1525229572

Note: While I have only observed this to happen for a Stage's verification process, this may actually apply to more areas of Kargo.

akuity / kargo

Distinguish permanent API errors from transient ones #1640