ansible / awx-resource-operator

41 stars 34 forks source link

end the play if the job is already running/ran + status update after creating k8s job #14

Closed mikeshng closed 4 years ago

mikeshng commented 4 years ago

Signed-off-by: Mike Ng ming@redhat.com

This PR is to address 2 issues:

1) When the operator restarts, all the AnsibleJobs that were previously ran will trigger again. With this PR, it will end the play and stamp the status with the following message:

...
    message: This job instance is already running or has reached its end state.
...

2) Currently, its not clear that the awx.awx.tower_job_launch failed to launch (ie a bad auth token) when you look at the status when the tower job failed to launch you see:

  status:
    conditions:
    - ansibleResult:
        changed: 1
        completion: 2020-09-04T15:56:13.223679
        failures: 0
        ok: 6
        skipped: 0
      lastTransitionTime: "2020-09-04T15:56:04Z"
      message: Awaiting next reconciliation
      reason: Successful
      status: "True"
      type: Running

which represent the k8s job was created for the job runner successfully but doesn't have any info about the tower_job failed to launch. This is very confusing for users. You will have to log in the log of the k8s job to see that the tower_job failed.

oc logs job/demo-job-jd2rn
...
TASK [job_runner : Launch a job] ***********************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Invalid Tower authentication credentials for /api/v2/job_templates/ (HTTP 401)."}
...

The new status when the tower job failed to launch

  status:
    ansibleJobResult:
      status: error
    conditions:
    - ansibleResult:
        changed: 2
        completion: 2020-09-04T16:31:44.004288
        failures: 0
        ok: 6
        skipped: 1
      lastTransitionTime: "2020-09-04T16:31:33Z"
      message: Awaiting next reconciliation
      reason: Successful
      status: "True"
      type: Running
    k8sJob:
      created: true
      env:
        secretNamespacedName: default/toweraccess
        templateName: Demo Job Template
        verifySSL: false
      message: Monitor the K8s job status and log for more details
      namespacedName: default/demo-job-wdq2b

A tower job that ran successfully will have the following status:

  status:
    ansibleJobResult:
      changed: true
      elapsed: "6.235"
      failed: false
      finished: "2020-09-04T18:49:19.029458Z"
      started: "2020-09-04T18:49:12.794464Z"
      status: successful
      url: https://ansible-tower-web-svc-tower._redacted_.com/#/jobs/playbook/98
    conditions:
    - ansibleResult:
        changed: 2
        completion: 2020-09-04T18:48:24.829147
        failures: 0
        ok: 6
        skipped: 1
      lastTransitionTime: "2020-09-04T18:48:09Z"
      message: Awaiting next reconciliation
      reason: Successful
      status: "True"
      type: Running
    k8sJob:
      created: true
      env:
        secretNamespacedName: default/toweraccess
        templateName: Demo Job Template
        verifySSL: false
      message: Monitor the K8s job status and log for more details
      namespacedName: default/demo-job-m6dt8

Added changed and failed to status.ansibleJobResult in this PR.

mikeshng commented 4 years ago

Resolved the conflict on the example file.

mikeshng commented 4 years ago

merged after resolving conflict

mikeshng commented 4 years ago

for https://github.com/open-cluster-management/backlog/issues/4893