Open mrdsam opened 2 days ago
Thanks for posting this issue.
One related change in 1.2.0 was enabling strict mode for MySQL in #699. Now, MySQL will fail for data being too big instead of silently truncating it. A similar issue was solved in #792, addressing other columns.
However, based on your log, the data for service_state.check_commandline
is too big. This column is of type text
, allowing 64K characters.
Could you please check your configuration if there are such huge values for some command line?
Thank you for your valuable input. Indeed "check_interfaces", which uses the performance data of the last check result to build it's new command line, exceeds this limit on hosts with many interfaces. I removed the checks and now I get...
+-------------------------------------+
| max(char_length(check_commandline)) |
+-------------------------------------+
| 28213 |
+-------------------------------------+
...what should be safe for now. I'll try 1.2.0 again ASAP and will report the outcome here.
Nevertheless, wouldn't it be nicer to check / truncate the string before trying to write it to the database and issue a warning instead?
Indeed "check_interfaces", which uses the performance data of the last check result to build it's new command line, exceeds this limit on hosts with many interfaces.
That's quite interesting, as having a total length of arguments exceeding 64K characters is a bit unexpected. Speaking of expectations, that's where the limit most likely came from, as it was there since the beginning and Icinga 2's IDO used the same type.
However, longer arguments are technically possible, as nicely outlined on this webpage which I have referred before. For example, on a Linux I get an ARG_MAX
of 2,097,152 and on an OpenBSD there is 524,288.
Now the question is how likely it is to have such huge values, exceeding the TEXT
type capacity. As seen in this very issue, it may happen.
However, to fit both ARG_MAX
listed above, the type must be at least MEDIUMTEXT
. Based on this table from the MySQL documentation, the required storage is based on the actual length of the string plus 2 or 3 bytes for TEXT
or MEDIUMTEXT
, respectively.
Raising the type should not be too expensive, but nonetheless I would like to have another opinion on that.
Nevertheless, wouldn't it be nicer to check / truncate the string before trying to write it to the database and issue a warning instead?
Definitely something should be done. Unless the type will be increased to MEDIUMTEXT
, I would vote for raising a warning as truncating data is dangerous, imo.
First, icingadb 1.2.0 is running without problems now, thanks @oxzi for the information about strict mode and the hint that my commands could be too long!
I would not take my case as an argument to increase the size of check_commandline
, because monitoring all ports of a 48 port switch without filters or splitting them up is not good practice in the first place.
My most important concern is that icingadb-daemon should keep running in such cases. IMO the best solution would be when icingadb check raises a warn or crit. So things keep running but you get notified that there's something wrong.
Describe the bug
After upgrading to icingadb-1.2.0_2 (from packages, as usual), icingadb frequently crashes (every 10 to 30 minutes). The error message is always the same, on both nodes (except for "\<n>", which is a always a different number)
Error 1406 (22001): Data too long for column 'check_commandline' at row <n>
See attached log (from start until crash) for details! start-end-1.txt
To Reproduce
icinadb restart
on both systemsAfter that, the problems started. Anyway, I proceeded with
icingadb stop
(both systems)icingadb start
(both systems)The problmes didn't get away after the optional schema upgrade, as expected.
I then restarted everything: icinga2, redis, icingadb, the MariaDB-Cluster (rolling method, node by node). Problem persists.
Expected behavior
Should not crash with fatal error.
Your Environment
Icinga2 HA-Cluster (with many satellites) on FreeBSD 13.3, with redis and icingadb on each cluster node.
Database is a 3-Node MariaDB Galera-Cluster.
Additional context
Workaround
Here is what I did as a workaround, so I didn't have to rollback the database to it's old state:
Since then, everything looks fine.
No problem with postgresql?!
Interesting fact: I also have icingadb 1.2.0 running on some of the satellites; the problem does NOT occur there! Only difference is that I use postgres as database backend there and they are single instance each.