Icinga / icingaweb2-module-director

The Director aims to be your new favourite Icinga config deployment tool. Director is designed for those who want to automate their configuration deployment and those who want to grant their “point & click” users easy access to the configuration.
https://icinga.com/docs/director/latest
GNU General Public License v2.0
412 stars 203 forks source link

Service Automation works on first run then fails some of the time #2230

Open sol1-matt opened 3 years ago

sol1-matt commented 3 years ago

Expected Behavior

Sync rule for service should run with out error.

Current Behavior

Director web UI

The first run of the service sync rule works The second run of the service sync rule throws an error.

/icingaweb2/director/syncrule?id=3079 This Sync Rule failed when last checked at 2020-11-26 01:16:50: Exception while syncing Icinga\Module\Director\Objects\IcingaService paneda_02fe3771-f33c-4753-8f30-8caa1ca5584d: Trying to recreate icinga_service ("{"host_id":"1455","object_name":"paneda_02fe3771-f33c-4753-8f30-8caa1ca5584d"}")

running check for changes it shows

Array
(
    [create] => 4
    [modify] => 0
    [delete] => 0
)

Icingacli director

When running the sync from the cli it returns a different result


# icingacli director syncrule check --id 3079
There are pending changes for this Sync Rule. You should  trigger a new Sync Run.
Expected modifications: 0x create, 4x modify, 0x delete

# icingacli director syncrule run --id 3079
Nothing has been changed, imported data is still up to date

Additional info

This only happens on some hosts, a sizable number of hosts but not all of them. It is possible to take the sync rules and point them at a newly created host or a different host created from automation and they sync rules function correctly.

The error message has been insufficient to determine why this may be occurring.

This has been tested using the built in director import source REST API and a separate import source module for importing generic web/json api's based on then netbox import module. Both import types have the same problem.

Possible Solution

Any suggestions on how to better debug this would be appreciated.

Steps to Reproduce (for bugs)

  1. setup import source that gets deeply nested json data which changes rapidly
  2. setup sync rule for services using new source
  3. become frustrated when only some things break

Your Environment

sol1-matt commented 3 years ago

We can delete the services, sync rules and import sources and recreate them and the problem reoccurs.

It could be a straight up database problem but I couldn't find doc's on how the tables link together for import_ or service.

SQL that shows what occurs when trying to recreate the icinga service, checks etc, would likely help identify the what the root source of the problem is and allow us to create a reproducible test case.

majales commented 3 years ago

Hi, I'm using Director Automation with simple CMDB table to check certain websites ( and their certs and evetual regex ) .. Isn't your problem in Sync rule?

I have Sync rule Test with:

Object Type: Host Update Policy: Replace Purge: Yes

You cannot create already created hosts, thats why "Update Policy: Replace".

sol1-matt commented 3 years ago

@majales I don't think it is a problem in the sync rule I setup

I have multiple import sources and service sync rules setup and the problem only occurs on some of them

Some recent testing has shown the same add single service sync rule behaves differently depending on the host the rule adds the services too. ie I change just the host and rule work.

I've also found that a add single service sync rule that previously was working stopped working without any changes being made to the rules for the parent host or the service itself.

once a how 'breaks' I haven't found a way to make them work again. Haven't tried deleting and re-adding the host though.

I haven't figured out what triggers the problem yet which makes it hard to provide a reproducible test case.

jlownie commented 1 year ago

When I ran into this problem today I was able to work around it by deleting all the services that were generated by the sync rule. Then it would run OK once, failing if you ran it a second time.

MAngel666 commented 1 month ago

I've similar problem, if a object is once created then it doesn't matter what "Update Policy" I take - Merge, Replace or Ignore, I get ever:

`Oops, an error occurred!

Exception while syncing Icinga\Module\Director\Objects\IcingaHostGroup Atlassian - JIRA-Confluence-Service Desk: Trying to recreate icinga_hostgroup ("Atlassian - JIRA-Confluence-Service Desk") (Sync.php:946)

0 /usr/share/icingaweb2/modules/director/application/forms/SyncRunForm.php(56): Icinga\Module\Director\Import\Sync->apply()

1 /usr/share/icinga-php/ipl/vendor/ipl/html/src/Form.php(238): Icinga\Module\Director\Forms\SyncRunForm->onSuccess()

2 /usr/share/icingaweb2/modules/director/application/controllers/SyncruleController.php(69): ipl\Html\Form->handleRequest(Object(GuzzleHttp\Psr7\ServerRequest))

3 /usr/share/icinga-php/vendor/vendor/shardj/zf1-future/library/Zend/Controller/Action.php(516): Icinga\Module\Director\Controllers\SyncruleController->indexAction()

4 /usr/share/php/Icinga/Web/Controller/Dispatcher.php(76): Zend_Controller_Action->dispatch('indexAction')

5 /usr/share/icinga-php/vendor/vendor/shardj/zf1-future/library/Zend/Controller/Front.php(954): Icinga\Web\Controller\Dispatcher->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))

6 /usr/share/php/Icinga/Application/Web.php(294): Zend_Controller_Front->dispatch(Object(Icinga\Web\Request), Object(Icinga\Web\Response))

7 /usr/share/php/Icinga/Application/webrouter.php(105): Icinga\Application\Web->dispatch()

8 /usr/share/icingaweb2/public/index.php(4): require_once('/usr/share/php/...')

9 {main}`

Only on "Update only" it does nothing... So if I want to make a new import from REST API of Assets (Atlassian) I need EVER DELETE all imported objects and then I can import them again...

I use: Director: V1.11 Icinga2: V2.14.2 Icingaweb2: 2.12 on RHEL 8.9