Closed stuartleeks closed 4 years ago
Can you please cherry-pick your changes related to #105 and send PR separately?
/azp run
I've rebased this on master and re-tested following the steps outlined in the PR
$ kubectl get run
NAME AGE RUNID STATE
run-sample 2m23s 15 PENDING
After a while, the job is in the error state:
export JOB_ID=15 # taken from kubectl get run output above
curl -H "Authorization: Bearer $DATABRICKS_TOKEN" $DATABRICKS_HOST/api/2.0/jobs/runs/get?run_id=$JOB_ID
{
"job_id": 24,
"run_id": 15,
"number_in_job": 1,
"state": {
"life_cycle_state": "INTERNAL_ERROR",
"result_state": "FAILED",
"state_message": "Library installation failed for library jar: \"dbfs:/my-jar.jar\"\n. Error messages:\njava.lang.Throwable: java.io.FileNotFoundException: dbfs:/my-jar.jar"
},
// rest omitted for brevity
Once the operator has next refreshed the INTERNAL_ERROR
status is reflected in the run status:
$ kubectl get run
NAME AGE RUNID STATE
run-sample 10m 15 INTERNAL_ERROR
Observing the operator logs shows that the status is still being reconciled every 30 seconds.
Fixes #105
To address the issue with not being able to reconcile runs(#105) this PR removes the check that propagated an error that was reported as part of the run state from the API to be a reconciler error. WIth this PR the reconciler now completes successfully and the run status is reported in the status for the run CRD instance (and availble via
kubectl describe run myrun