Wrong generated files: vars are all wrong (overwritten or not set) with Git Master version

linuxmail commented 5 years ago

hi,

I've posted into the forum this strange behavior too: https://monitoring-portal.org/t/strange-problem-with-generated-files-nearly-all-vars-overwritten/6285

It may started with the latest Director upgrade (git pull), or with adding 2nd master. The preview from the command / host / service looks still ok, but if I generate the files, all vars from all services / hosts / ... gets overwritten, may by the first check command (?)

Expected Behavior

hosts.conf

...
vars.hostname = "srv-antiv2"
vars.notification_interval = "0"
vars.os = "windows"
vars.virtual_machine = true

Current Behavior

hosts.conf
```
vars.domain = "atemp"
vars.hostname = "atemp"
vars.virtual_machine = "atemp"
```
This happens for nearly all files, like for service/ host_templates ... It seems, it gets overwritten for all vars.XXXX = "FOO"

Your Environment

Director version (System - About): Master: baa3ae248f4a58a368e441a59ddb50a6ebea2df0
Icinga Web 2 version and modules (System - About): 2.6.3
Icinga 2 version (icinga2 --version): 2.10.4-1.stretch
Operating System and version: Debian Stretch
Webserver, PHP versions: Apache2 / 7.0+49

linuxmail commented 5 years ago

hi,

I removed the configs in Director and re-generated the configs. Now all variables are replaced with $address$:

   vars.domain = "$address$"
    vars.hostname = "$address$"
    vars.virtual_machine = "$address"

linuxmail commented 5 years ago

Unfortunately, I can't go back to the previous version, because the MariaDB backups where not configured from the DBAs. Is there a way to switch to 1.6.2 ?

lazyfrosch commented 5 years ago

I had a quick read through your explanations here and on m-p.

This sounds like nothing Director could have caused, since every var value is stored individually for every object.

The only thing that connects variables are fields, but they are more like a GUI feature. Datalists are similar, they restrict user input, not the storage.

You should be able to switch to 1.6.2 when adding a single line:

diff --git a/library/Director/Objects/IcingaCommand.php b/library/Director/Objects/IcingaCommand.php
index 4486cb8d..8e1776ad 100644
--- a/library/Director/Objects/IcingaCommand.php
+++ b/library/Director/Objects/IcingaCommand.php
@@ -27,6 +27,7 @@ class IcingaCommand extends IcingaObject implements ObjectWithArguments, ExportI
         'command'               => null,
         'timeout'               => null,
         'zone_id'               => null,
+        'is_string'             => null,
     ];

     protected $supportsCustomVars = true;

Have you tried dumping the DB? Maybe re-import it to a different database to see if the contents of the DB are somehow corrupted?

lazyfrosch commented 5 years ago

Apart from that have a look at:

select * from icinga_host_var;

linuxmail commented 5 years ago

hi,

thanks for the hint with is_string. What I saw also: Everytime I force to generate config and have a look on it: it changes (!)

First time:

    address = "192.168.xx.xx"
    groups = [ "IBM Storage" ]
    vars.domain = "0"
    vars.hostname = "0"
    vars.virtual_machine = "0"

Second time:

    address = "192.168.XX.XX"
    groups = [ "IBM Storage" ]
    vars.domain = "$address$"
    vars.hostname = "$address$"
    vars.virtual_machine = "$address$"

So everytime I create a new config: it switches.

    -> ;
+---------+-----------------+------------------+--------+----------+
| host_id | varname         | varvalue         | format | checksum |
+---------+-----------------+------------------+--------+----------+
|       8 | domain          | netzinatec.local | string | NULL     |
|       8 | hostname        | ibm-hba-02       | string | NULL     |
|       8 | virtual_machine | false            | json   | NULL     |
+---------+-----------------+------------------+--------+----------+
3 rows in set (0.00 sec)

linuxmail commented 5 years ago

Hmm, If try test / try to generate the config again .. the checksum is always now the same "director/config/files?checksum=24023978ac5d3912cc3d8b6977217bfa49383690"

If I open a new browser (private mode) and click then generate config, I have a new checksum, as example "bb622a7a5d37c73b57ef02e594a6db217b5ef099", but the content is still invalid (All vars filled with "$address$.

I tried is several times, but it seems, it does not "work" all the time. I switched also from the MariaDB cluster address (ProxySQL + HAproxy) to a single node. But with no changes. I created also a dump from the DirectorDB and and pushed the backup back to a different DB, without any problems.

Could it be possible, that the IDO DB is broken ? The question for me is: why does the preview looks OK, while the finally generated files looks different ...

lazyfrosch commented 5 years ago

IDO is completely unrelated to Director.

What MariaDB versions are we talking about? Any special configuration?

lazyfrosch commented 5 years ago

What I found out so far:

We explicitly set PIPES_AS_CONCAT in icingaweb2 for every MySQL connection
We are building checksums for every CustomVar to speed up rendering: $columns['checksum'] = "UNHEX(SHA1(v.varvalue || ';' || v.format))";
If PIPES_AS_CONCAT is not set on the connection this will cause massive problems

You can try out to comment this line:

https://github.com/Icinga/icingaweb2-module-director/blob/baa3ae248f4a58a368e441a59ddb50a6ebea2df0/library/Director/Db/Cache/CustomVariableCache.php#L37

This should disable the checksum for rendering.

But we still need to find out why the sql_mode is not what it is supposed to be.

Opinions @Thomas-Gelf ?

lazyfrosch commented 5 years ago

Corresponding code in icingaweb2:

https://github.com/Icinga/icingaweb2/blob/403c2d34954687a9baa13be91d5c2a2d735bd568/library/Icinga/Data/Db/DbConnection.php#L201-L203

lazyfrosch commented 5 years ago

In other news: https://github.com/sysown/proxysql/issues/1279#issuecomment-350835820

linuxmail commented 5 years ago

We use 10.2.22+maria~stretch in a tree node Galera Cluster with

[mysqld]
...
binlog_format = ROW
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_bin
datadir = /opt/mariadb/mysql
default_storage_engine = InnoDB
expire_logs_days = 10
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_size = 5G
innodb_doublewrite = 1
innodb_lock_schedule_algorithm = FCFS
innodb_log_file_size = 512M
innodb_write_io_threads = 32
key_buffer_size = 16M
...

nothing special. ProxySQL is: ProxySQL rev. 1.4.14-percona-1.1 -- Wed Feb 6 11:36:16 2019

It seems, after I switched back to the single node IP, I can't see any problems anymore, after I rolled out a few times a config version, before all the things messed up (a few days before).
Also I did some changes ... and regenrate config ... all is fine again.

So you mean, I should switch back to the cluster IP; test the generate ; and comment the checksum line and see, what happens ? But I would say: it makes sense, because I did not used the Director to make changes. Only importing new data from PuppetDB and pushed the changes to Icinga2.

First problems started, after adding our mon-02 and the IDO switches immediately to the new mon-02 and Icingaweb2 shows nothing (maybe because of no API settings in Icingaweb2). So its hard to tell, which thing triggerd the problems.

From the Proxysql log, I've found:

O_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,ANSI_QUOTES,PIPES_AS_CONCAT,NO_ENGINE_SUBSTITUTION', time_zone='+2:00';
2019-05-03 15:25:46 MySQL_Session.cpp:3940:handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_qpo(): [ERROR] Unable to parse query. If correct, report it as a bug: SET SESSION SQL_MODE='STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,ANSI_QUOTES,PIPES_AS_CONCAT,NO_ENGINE_SUBSTITUTION', time_zone='+2:00';

lazyfrosch commented 5 years ago

I talked to @dnsmichi on Friday, we had a similar issue with ProxySQL where sql_mode wasn't set correctly between connections.

Here is an official statement: We don't support ProxySQL or similar software.

In terms of cluster, feel free to use loadbalancers, cluster IPs or whatever, as long as it not tries to outsmart SQL on a protocol level.

You can try commenting the line for finding out if it is the problem with ProxySQL, but the code is used like this to improve performance on rendering a huge config.

lazyfrosch commented 5 years ago

I'm going to close this issue, feel free to open a new one when you spot another error, but only when ProxySQL is not in the connection :wink:

baurmatt commented 5 years ago

@lazyfrosch Thanks for clarifying! :) Could you please add this to the documentation of Icinga(Web)?2?

Icinga / icingaweb2-module-director