ansible / awx

AWX provides a web-based user interface, REST API, and task engine built on top of Ansible. It is one of the upstream projects for Red Hat Ansible Automation Platform.
Other
13.91k stars 3.4k forks source link

Failed to JSON parse a line from worker stream due to partial ansible-runner event #14859

Closed pwalters04 closed 7 months ago

pwalters04 commented 7 months ago

Please confirm the following

Bug Summary

For OCI Dynamic Inventory sourcing, I have created a custom-ee with the base of awx-ee:lates ( as of 2/7). When I attempt to run the job for syncing the inventory source, after ~5800 lines of stdout, I receive the following errors:


Failed to JSON parse a line from worker stream. Error: Extra data: line 1 column 5828 (char 5827) Line with invalid JSON data: b'{"event": "verbose", "uuid": "******************", "counter": 1032, "stdout": "Final inventory for instance ocid1.instance.********* is {\'IP-ADDRESS\': {\'groups\': {\'Prab_US-ASHBURN-AD-1\': {\'children\': []}, \'region_us-ashburn-1\': {\'children\': []}, \'all_hosts\': {\'children\': []}, \'Platform\': {\'children\': []}}, \'vars\': {\'availability_domain\': \'Prab:US-ASHBURN-AD-1\', \'capacity_reservation_id\': None, \'compartment_id\': \'ocid1.compartment.********************', \'dedicated_vm_host_id\': None, \'defined_tags\': {}, \'display_name\': None, \'extended_metadata\': None, \'fault_domain\': None, \'freeform_tags\': {}, \'id\': \'ocid1.instance.oc1.iad.***********************', \'image_id\': None, \'ipxe_script\': None, \'launch_mode\': \'PARAVIRTUALIZED\', \'launch_options\': {\'boot_volume_type\': \'PARAVIRTUALIZED\', \'firmware\': \'BIOS\', \'network_type\': \'PARAVIRTUALIZED'

My values file is

  spec:
    no_log: false
    image_pull_policy: always
    task_extra_env: |
      - name: GIT_SSL_NO_VERIFY
        value: "True"

    task_replicas: 1
    task_resource_requirements:
      requests:
        cpu: 500m
        memory: 5Gi
      limits:
        cpu: 2000m
        memory: 10Gi
    # AWX Projects Directory
    projects_persistence: true
    projects_storage_class: awx-storage
    projects_storage_size: 50Gi

    # Postgres Secret
    postgres_configuration_secret: awx-postgres-secret

    # AWX Default User
    admin_user: admin

    # Application Images
    image: harbor-private/platform-infra/awx
    image_version: "23.5.1"
    ee_images:
      - name: awx
        image: harbor-private/platform-infra/awx-ee:latest
    ee_extra_env: |
      - name: RECEPTOR_RELEASE_WORK
        value: "False"
      - name: AWX_CLEANUP_PATHS
        value: "False"
      - name: RECEPTOR_KUBE_SUPPORT_RECONNECT
        value: "Enabled"
    control_plane_ee_image: harbor-private/platform-infra/awx-ee:latest
    redis_image:harbor-private/platform-infra/redis
    redis_image_version: "7"

AWX version

23.5.1

Select the relevant components

Installation method

kubernetes

Modifications

no

Ansible version

No response

Operating system

No response

Web browser

Chrome

Steps to reproduce

Add source project for inventory source, sync

Expected results

inventory imported

Actual results

json parse error

Additional information

No response

TheRealHaoLiu commented 7 months ago

this case is different from https://github.com/ansible/awx/issues/14693 and the error message shows a "partialy" transmitted JSON.

Going through the code I have no idea how this could happen.

receptor read output from the job pod 1 line at a time https://github.com/ansible/receptor/blob/4264e3c0911d0cda5cf0532e5cdb031bcbc7ced2/pkg/workceptor/kubernetes.go#L241

then write to a file on disk https://github.com/ansible/receptor/blob/devel/pkg/workceptor/kubernetes.go#L287

then at ansible-runner process read from receptorctl work results (which reads from the stdout file)

we shouldn't be writing partial lines to the stdout file thus receptor shouldnt be trasmitting partial lines to ansible-runner process

@pwalters04 can u try the debug step I wrote here https://github.com/ansible/receptor/blob/devel/pkg/workceptor/kubernetes.go#L287

provide me with

the /api/v2/jobs/ of the fail job (or if its inventory update or whatever "job-ish" that resulted in with the error) receptor log during the failure

pwalters04 commented 7 months ago

Hi @TheRealHaoLiu Thank you for your help. I am no longer getting this error after I update the spec on values file with all versions of awx-ee being the same version.