CheckWhenObserve mode broken b/c ansible logs are not valid json

d-honeybadger commented 8 months ago

What happened?

In CheckWhenObserve mode, reconciliation always fails with observe failed: Error decoding results\n\tinvalid character error. This error comes from provider-ansible attempting to parse ansible logs: https://github.com/crossplane-contrib/provider-ansible/blob/5e0d99269289a6c47b232d829eff424cfa374087/internal/controller/ansibleRun/ansibleRun.go#L363 The function results.ParseJSONResultsStream expects logs in json format (e.g. something you'd get using json callback. so, when trying to parse "classic" non-json ansible logs it, natually, fails.

How can we reproduce it?

Create any ansiblerun with CheckWhenObserve policy. Here's my example:

apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
  annotations:
    ansible.crossplane.io/runPolicy: CheckWhenObserve
  name: example-add-line
spec:
  forProvider:
    inventoryInline: |2
      cluster:
        hosts:
          default-0:
            ansible_host: <IP>
    playbookInline: |2
      - name: add-line-to-file
        hosts: cluster
        tasks:
        - name: Add a line to file
          ansible.builtin.lineinfile:
            path: /root/testfile
            line: added-line
            state: present
    vars:
      ansible_ssh_private_key_file: ./ssh_id
  providerConfigRef:
    name: example

Observe the following status:

status:
  atProvider: {}
  conditions:
  - lastTransitionTime: "2024-01-17T21:57:05Z"
    message: "observe failed: Error decoding results\n\tinvalid character 'P' looking
      for beginning of value"
    reason: ReconcileError
    status: "False"
    type: Synced

Note that the same playbook succeeds in ObserveAndDelete mode.

What environment did it happen in?

Crossplane version: 1.14.0

d-honeybadger commented 7 months ago

Was trying to figure out how to get those ansible logs parsed for another issue (extracting failure message for the Ready condition) and not seeing any quick and elegant solutions tbh. Here's what I explored:

Use ansible's json callback to have it output logs in json format that can then be parsed by go-ansible package. Doesn't work cause ansible-runner always sets its own custom awx_display callback, and only one callback can be enabled.
Same as above, but use go-ansible instead of ansible-runner, so that we can have json callback and integrate it nicely with go. Doesn't work cause go-ansible lacks some features that provider-ansible depends on, like executable inventory.
Try to use awx_display output. It's also in json, but a completely different json structure than vanilla ansible json, so there's no package for parsing it, so would require defining all the custom types for parsing json events.

Basically, No.3 is the only option. It's a lot of work for a seemingly small thing, but there's nothing fundamentally wrong with it.

Lmk if I have a green light to work on it :)

d-honeybadger commented 7 months ago

@morningspace Since you were looking at my other issues and PRs (thank you!), maybe you could check out this one too?

dfry commented 2 months ago

Hi @d-honeybadger , is there any progress on being able to use CheckWhenObserve policy? Cheers

d-honeybadger commented 1 month ago

Hi @d-honeybadger , is there any progress on being able to use CheckWhenObserve policy? Cheers

There's some progress in parsing logs in the awx_display format (that's what ansible-runner produces by default), but it isn't comprehensive, it just extracts a few structs that are relevant for determining success/failure. The solution for this bug should probably build up on that parsing to also extract "changed" field from running tasks in check mode.

I can take a look in a bit now that there's interest in this ticket :) but also if you're at all interested in tackling it yourself I'll be happy to guide you through it

dfry commented 1 month ago

Thanks for the update. I am pretty swamped right now. I am interested in figuring out how to construct my role in order to play nicely with CheckWhenObserve mode. There is only one task in my role that returns different results that I would want to use to trigger another run. I was hoping that there is someway with the current release that I could achieve that end goal.

cheers

crossplane-contrib / provider-ansible