auroraresearchlab / netbox-dns

Netbox Dns is a netbox plugin for managing zone, nameserver and record inventory.
MIT License
208 stars 20 forks source link

Netbox webhook does not work on zone update event #296

Closed kemeris2000 closed 1 year ago

kemeris2000 commented 1 year ago

netbox-dns itseft in netbox gui is working fine. I have set up on DNS zone update webhook to trigger custom script, and script fails with following error:

Traceback (most recent call last):
  File "/opt/netbox/venv/lib/python3.9/site-packages/django/apps/registry.py", line 158, in get_app_config
    return self.app_configs[app_label]
KeyError: 'netbox_dns'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/netbox/venv/lib/python3.9/site-packages/rq/job.py", line 417, in _deserialize_data
    self._func_name, self._instance, self._args, self._kwargs = self.serializer.loads(self.data)
  File "/opt/netbox/venv/lib/python3.9/site-packages/django/db/models/base.py", line 2477, in model_unpickle
    model = apps.get_model(*model_id)
  File "/opt/netbox/venv/lib/python3.9/site-packages/django/apps/registry.py", line 208, in get_model
    app_config = self.get_app_config(app_label)
  File "/opt/netbox/venv/lib/python3.9/site-packages/django/apps/registry.py", line 165, in get_app_config
    raise LookupError(message)
LookupError: No installed app with label 'netbox_dns'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/netbox/venv/lib/python3.9/site-packages/rq/worker.py", line 1353, in perform_job
    self.prepare_job_execution(job)
  File "/opt/netbox/venv/lib/python3.9/site-packages/rq/worker.py", line 1203, in prepare_job_execution
    self.procline(msg.format(job.func_name, job.origin, time.time()))
  File "/opt/netbox/venv/lib/python3.9/site-packages/rq/job.py", line 451, in func_name
    self._deserialize_data()
  File "/opt/netbox/venv/lib/python3.9/site-packages/rq/job.py", line 419, in _deserialize_data
    raise DeserializationError() from e
rq.exceptions.DeserializationError

I am not sure if this is netbox-dns or netbox itseft problem?

peteeckel commented 1 year ago

I am not sure if this is netbox-dns or netbox itseft problem?

Neither am I ... can you provide some minimal sample code that triggers the problem?

It would also be very helpful to know the versions of the software involved, particularly NetBox and NetBox DNS, and the configuration.

kemeris2000 commented 1 year ago

Thank you for your reply peteeckel.

Netbox version: 3.4.7 netbox-dns: 0.17.0

I use netbox custom script to execute ansible playbooks. This script works when triggered on dcim devices update event or virtualization virtual-machines update event. However on netbox_dns zone update event script fail. In all cases webhook is almost identical with exception of could variables values in webhook body template.

Webhook body template:

{
   "data": {
       "hosts_limit": "localhost",
       "playbook": "netbox-rndc.yaml",
       "pg_timeout": 300,
       "verbosity": 0,
       "quiet": "true"
   },
   "commit": true
}

Netbox custom script:

from utilities.exceptions import AbortScript
import os
import json
import tempfile
import subprocess
import re
import sys
import ansible_runner
import socket

class universal(Script):
    class Meta:
       name = "ansible-runner"
       description = "Executes ansible playbooks"
       job_timeout = 900

    hosts_limit = StringVar(max_length=20, label="Hosts limit", required=True)
    playbook = StringVar(max_length=200, label="Playbook", required=True)
    pg_timeout = IntegerVar(default=300, label="Postgres timeout in sec", required=True)
    verbosity = IntegerVar(default=1, label="Ansible-playbook verbosity", required=True)
    quiet = BooleanVar(default='true', label="Playbook quiet mode", required=True)
    netbox_event = StringVar(max_length=200, label="Netbox event", required=False)
    ansible_extra_vars = StringVar(max_length=200, label="Ansible extra vars", required=False)

    def run(self, data, commit):

        os.environ['PGOPTIONS'] = '-c statement_timeout='+ str(data['pg_timeout'])
        playbook_path = "/var/lib/ansible/linux_state"
        envvars = {
            'ANSIBLE_CONFIG': playbook_path + '/ansible.cfg',
            'ANSIBLE_HOST_KEY_CHECKING': 'False',
            'ANSIBLE_DISPLAY_SKIPPED_HOSTS': 'False'
        }
        playbook = data['playbook']
        hosts_limit = data['hosts_limit']
        verbosity = data['verbosity']
        quiet = data['quiet']
        if 'netbox_event' in data:
            netbox_event = data['netbox_event']
        else:
            netbox_event = ''
        if 'ansible_extra_vars' in data:
            ansible_extra_vars = data['ansible_extra_vars']
        else:
            ansible_extra_vars = ''
        kwargs = {
            'playbook': playbook_path +'/'+ playbook,
            'limit': hosts_limit,
            'envvars': envvars,
            'private_data_dir': '/tmp',
            'verbosity': verbosity,
            'cmdline': '--user awx --extra-vars '+ansible_extra_vars,
            'quiet': 'true'
        }

        hostname = socket.gethostname()
        self.log_info(f"Node {hostname}: Executing playbook {playbook} for target {hosts_limit}")
        result = ansible_runner.run(**kwargs)
        stdout = result.stdout.read()
        stats = result.stats
        stats = json.dumps(stats, indent=4)
        events = list(result.events)
        if result.rc != 0:
          self.log_failure(f"Node {hostname}: Playbook {playbook} failed for target {hosts_limit}")
          self.log_failure(f"{stdout}")
          sys.tracebacklimit = 0
          raise AbortScript("Script failed because of failed playbook")
        else:
          self.log_success(f"Node {hostname}: Playbook {playbook} finished successfully for target {hosts_limit}")
        return (stdout)
peteeckel commented 1 year ago

Well, this is not exactly what I meant when I wrote 'minimal' (neither is it correct or complete, by the way) ... never mind. I actually wanted to reproduce your problem, but since I'm a bit short on time lately I'll resort to guessing.

Did you restart the rqworker after installing NetBox DNS and modifying configuration.py, or just netbox? The webhook script is run by that process, and the error message seems to indicate that it does not know about NetBox DNS.

If the answer is 'no' it's probably a NetBox issue.

peteeckel commented 1 year ago

OK, I've tested it with your versions. Maybe I'm too curious. This is minimal:

Create a test script

Since your script doesn't actually use any data or models related to NetBox DNS or NetBox, it's safe to replace it with a stripped-down version:

from extras.scripts import Script

class test(Script):
    class Meta:
       name = "Test"
       description = "Simple test script"

    def run(self, data, commit):
        self.log_success('Script ran successfully')

Create a webhook

The definition of the webhook is slightly more involved:

    {
      "display": "Update Zone",
      "content_types": [
        "netbox_dns.zone"
      ],
      "name": "Update Zone",
      "type_update": true,
      "payload_url": "https://192.168.106.105/api/extras/scripts/webhook.test/",
      "enabled": true,
      "http_method": "POST",
      "http_content_type": "application/json",
      "additional_headers": "Authorization: Token d4e691e868a3ac253b27efe33f05c807c5761ce3",
      "body_template": "{\r\n    \"data\": {},\r\n    \"commit\": true\r\n}",
      "secret": "",
      "conditions": null,
      "ssl_verification": false,
      "ca_file_path": null,
    }

Update the zone (creating a record does the trick)

In the log tab for the custom script I can see that the script was executed successfully. So neither NetBox nor NetBox DNS is the problem, but probably the missing restart of the rqworker (or you left out some other piece of information).

peteeckel commented 1 year ago

By the way, running an Ansible Playbook on every zone update might get you into surprisingly big trouble with performance issues ... zone updates are rather frequent, especially for bulk operations on records, so I'd resort to a two-staged approach: e.g. each zone update sets a flag and resets a timer to a certain interval, say 10 seconds, and only when the timer expires the playbook is actually triggered.

While rndc is quite fast, Ansible certainly is not.

kemeris2000 commented 1 year ago

Your assumption was correct, I forgot to restart rqworker. And you are correct about my netbox custom script, it should not contain substring --extra-vars '+ansible_extra_vars. Ansible is slow, but it cover my lack of python knowledge. You idea with time delay is good, I will try it.

I really appreciate your time peteeckel and sorry for my mistake.

peteeckel commented 1 year ago

Hi @kemeris2000, glad I could help!