ansible / ansible-rulebook

Apache License 2.0
193 stars 77 forks source link

ansible_rulebook.rule_set_runner - ERROR - Error calling action run_playbook #614

Open charlespick opened 11 months ago

charlespick commented 11 months ago

Please confirm the following

Bug Summary

I'm using Webhook event source to trigger ansible rulebook from the the command line using another locally installed service. The first time it triggers it almost always works but subsequent triggers I sometimes get an error (below)

Environment

1.0.3 Executable location = /usr/local/bin/ansible-rulebook Drools_jpy version = 0.3.7 Java home = /usr/lib/jvm/java-17-openjdk-amd64 Java version = 17.0.8.1 Python version = 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]

Ubuntu 22.04.3 LTS on ESXi AMD64

Steps to reproduce

Using this rulebook:

---
- name: Run playbook
  hosts: all
  sources:
    - ansible.eda.webhook:
        host: 127.0.0.1
        port: 6000
  rules:
    - name: Webhook called
      condition: event.payload.cmd == 'start'
      action:
        run_playbook:
          name: /home/charlespick/playbook.yml

Actual results

2023-10-29 18:46:29,916 - ansible_rulebook.rule_set_runner - ERROR - Error calling action run_playbook, err [('/home/charlespick/.ansible/cp/47aa5da64f', '/tmp/edach48wb7s/project/.ansible/cp/47aa5da64f', "[Errno 6] No such device or address: '/home/charlespick/.ansible/cp/47aa5da64f'")]

Expected results

Playbook should execute reliably

Additional information

357c703fdf9d05295589f06bdd5da2ac3d25478f 1

mkanoor commented 11 months ago

@charlespick Since you are using a local playbook file, it would have to be copied into the project directory for the ansible-runner. Can you add copy_files: True as an option like shown here https://github.com/ansible/ansible-rulebook/blob/7e34383f1e9ba404134de011d2d88af8f768cbd5/tests/rules/test_set_facts.yml#L30

charlespick commented 11 months ago

Hi @mkanoor It doesn't look like that worked

image
mkanoor commented 10 months ago

Is it possible that there is a playbook in your collection with the same name?

nageshredhat commented 7 months ago
- name: Listen for events on a webhook
  hosts: all
  ## Define our source for events
  sources:
    - ansible.eda.webhook:
        host: 0.0.0.0
        port: 5000
  ## Define the conditions we are looking for
  rules:
    - name: Say Hello
      condition: event.payload.message == "Ansible is super cool"
  ## Define the action we should take should the condition be met
      action:
        run_playbook:
          name: say-what.yml

say-what.yml

- name: say thanks
  hosts: localhost
  gather_facts: false
  tasks:
    - debug:
        msg: "Thank you, {{ event.sender | default('my friend') }}!"

Try to execute this rulebook

To trigger the rule book use following command. curl -H 'Content-Type: application/json' -d "{\"message\": \"Ansible is alright\"}" 127.0.0.1:5000/endpoint curl -H 'Content-Type: application/json' -d "{\"message\": \"Ansible is super cool\"}" 127.0.0.1:5000/endpoint

Alex-Izquierdo commented 7 months ago

Hi @nageshredhat We need the output of your ansible-rulebook cmd as well as the output of the ansible-rulebook --version. You can also try with -vv flag for more debug information.

muhammad-rafi commented 5 months ago

I have same issue, rather than raising new issue, thought to discuss here, here is my rulebook

- name: Listen Kafka Events for BGP Neighbors
  hosts: all
  sources:
  - ansible.eda.kafka:
      host: "{{ hostname }}"
      port: "{{ port }}"
      topic: "{{ topic }}"
      group_id: "{{ group_id }}"
      offset: latest
      verify_mode: CERT_NONE
      security_protocol: SASL_PLAINTEXT
      sasl_mechanism: SCRAM-SHA-512
      sasl_plain_username: "{{ sasl_plain_username }}"
      sasl_plain_password: "{{ sasl_plain_password }}"

  rules:
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:
          msg: |
            **Device: {{ event.body.tags.source }} 
            **BGP Neighbor: {{ event.body.tags.neighbor_address }}
            **Description: {{ event.body.fields.description | default('N/A') }}
            **Remote ASN: {{ event.body.fields.remote_as_number }}
            **Address Family: {{ event.body.fields['af_data/af_name']}}
            **Prefix Limit: {{ event.body.fields['af_data/max_prefix_limit'] }}
            **Prefix Limit Threshold: {{ event.body.fields['af_data/max_prefix_threshold_percent'] }}
            **Reason: {{ event.body.fields.reset_reason }}

command to run this rulebook

ansible-rulebook --rulebook rulebooks/bgp-max-pfx-rulebook.yml -i cml_hosts.yml --verbose --vars .kafka_vars.yml

Here is the error keep repeating

2024-04-25 11:39:33,415 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:34,094 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:34,792 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:35,443 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:36,120 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:36,793 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:37,488 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:38,140 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:38,853 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:39,632 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:40,355 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:41,067 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:41,756 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:42,422 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:43,107 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:43,799 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 11:39:44,482 - ansible_rulebook.rule_set_runner - ERROR - 

It started ok and after couple of minutes, I am getting this error and it keeps repeating itself.

Please advise.

mkanoor commented 5 months ago

Are all those attributes defined in the event payload. You can just use the action: debug:

to see what it prints that way we will know if its missing fields in the substitution. In Jinja you can put a default value like

{{ event.body.tags.source |default("missing")}} 
muhammad-rafi commented 5 months ago

Are all those attributes defined in the event payload. You can just use the action: debug:

to see what it prints that way we will know if its missing fields in the substitution. In Jinja you can put a default value like

{{ event.body.tags.source |default("missing")}} 

thanks for the response @mkanoor , I will try that out, thanks for the advise, but is this related to the issue I am having ?

mkanoor commented 5 months ago

@muhammad-rafi When you change it to this

 rules:
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:

We will at least know if the issue is related to the attribute missing in the Jinja substitution. The default debug action prints the entire payload.

muhammad-rafi commented 5 months ago
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:
          msg: |
            **Device: {{ event.body.tags.source | default("missing") }} 
            **BGP Neighbor: {{ event.body.tags.neighbor_address | default("missing") }}
            **Description: {{ event.body.fields.description | default("missing") }}
            **Remote ASN: {{ event.body.fields.remote_as_number | default("missing") }}
            **Address Family: {{ event.body.fields['af_data/af_name'] | default("missing") }}
            **Prefix Limit: {{ event.body.fields['af_data/max_prefix_limit'] | default("missing") }}
            **Prefix Limit Threshold: {{ event.body.fields['af_data/max_prefix_threshold_percent'] | default("missing") }}
            **Reason: {{ event.body.fields.reset_reason | default("missing") }}

@mkanoor I have changed it to this as too by you but I was not getting missing values, the issue is, it starts OK in the beginning and the after couple of I starts getting the following Memory threshold reached issue along with the one I mentioned earlier.

2024-04-25 22:53:24 543 [Thread-0] WARN org.drools.ansible.rulebook.integration.api.rulesengine.AutomaticPseudoClock - Pseudo clock is diverged, the difference is 207 ms. Going to sync with the real clock.
2024-04-25 22:54:57,237 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 386 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 503722296
2024-04-25 22:54:57,387 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 546 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 503981880
2024-04-25 22:54:57,547 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 719 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 504231584
2024-04-25 22:54:57,719 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:57 885 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 504512136
2024-04-25 22:54:57,886 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:58 049 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 504788864
2024-04-25 22:54:58,050 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:58 203 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 93% > 90%. MaxMemory = 536870912, UsedMemory = 505036304
2024-04-25 22:54:58,203 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 93% > 90%
2024-04-25 22:54:58 367 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 505286392
2024-04-25 22:54:58,367 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:58 544 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 505618384
2024-04-25 22:54:58,545 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:58 706 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 505848056
2024-04-25 22:54:58,706 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:58 875 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 506078912
2024-04-25 22:54:58,875 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
2024-04-25 22:54:59 056 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 506334488
2024-04-25 22:54:59,057 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%
^C2024-04-25 22:54:59 416 [main] ERROR org.drools.ansible.rulebook.integration.api.rulesengine.MemoryMonitorUtil - Memory occupation is above the threshold: 94% > 90%. MaxMemory = 536870912, UsedMemory = 506578240
2024-04-25 22:54:59,417 - ansible_rulebook.rule_set_runner - ERROR - org.drools.ansible.rulebook.integration.api.rulesengine.MemoryThresholdReachedException: Memory threshold reached: 94% > 90%

omitted some output 

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-0"
2024-04-25 23:15:29,696 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:30,424 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:31,198 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:31,910 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:32,649 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:33,339 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:34,034 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:34,716 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:35,408 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:36,071 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:36,767 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:37,510 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:38,174 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:38,849 - ansible_rulebook.rule_set_runner - ERROR - 
2024-04-25 23:15:39,529 - ansible_rulebook.rule_set_runner - ERROR - 

besides this I have another rule for the playbook to run and same issue if I enable that rule too.

    - name: Display Kafka Logs and Run Action Playbook
      # for ioxr BGP neighbor down due to prefix limit exceeded
      condition: events.body.fields.is_neighbor_max_prefix_shutdown == "true" and events.body.fields.reset_reason == "max-prefix-exceeded"
      action:
        run_playbook:
          name: playbooks/bgp-max-pfx-fix.yml
          extra_vars:
            target_host: "{{ event.body.tags.source }}"
            bgp_neighbor: "{{ event.body.tags.neighbor_address }}"
            bgp_remote_asn: "{{ event.body.fields.remote_as_number }}"
            bgp_local_asn: "{{ event.body.fields.local_as }}"
          verbosity: 1
          copy_files: True

please advise.

muhammad-rafi commented 5 months ago

@muhammad-rafi When you change it to this

 rules:
    - name: Reason for BGP State Down
      condition: events.body.fields.open_check_error_code == "neighbor-down"
      action:
        debug:

We will at least know if the issue is related to the attribute missing in the Jinja substitution. The default debug action prints the entire payload.

@mkanoor I have tried this suggested as well, it does print the entire payload but again same issue happening after couple of minutes

mkanoor commented 5 months ago

@muhammad-rafi It seems like some sort of a memory leak in the aiokafka or the kafka source plugin. How many events do you think are getting sent across? Is there a lot of events coming along and since they don't get ack'ed is the same event repeating. If you ran ansible-rulebook with the -vv option you will see the event coming in. Also monitor the memory of the process, the JVM is dying because its running out of memory because something is not releasing memory.

muhammad-rafi commented 5 months ago

thanks @mkanoor I have a same doubt, this may be happening, I may be dealing with 1000+ interesting events, I was looking for some work around to slow it down or control the memory option with ansible rulebook. Let me know please if you have any suggestions

muhammad-rafi commented 5 months ago

just to add, on the kafka we dont see any issue, it must be at my end.

muhammad-rafi commented 4 months ago

@mkanoor any other thoughts on this one please ? this is still an issue

mkanoor commented 4 months ago

@muhammad-rafi Do you see the memory increasing for the python process? How many events are being processed?