Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Unpredictable behavior when parsing configuration #7441

Closed acid-eater closed 5 years ago

acid-eater commented 5 years ago

Describe the bug

When parsing the Icinga2 configuration files the parser sometimes reports "critical/config: Error: Error while evaluating expression: String index is out of range" errors although the code is valid.

It happens when accessing a character in a string that is part of an array while parsing the string using a "while" statement in a function:

myarray=[ „ABC“, „XTS“, „TMV“ ]
mysearchchar="B"
myindex=0
while (myindex < len(myarray[0])) {
  if (mysearchchar == myarray[0].substr(myindex,1)) {
    mychar = myarray[2].substr(myindex,1)
    break
  }           
  myindex += 1
}

Please try to refrain from criticizing the code or offering "helpful" improvements, that's NOT the issue here! The code is "as is" for a reason.

The issue is that icinga2 reports an "index is out of range" error only sometimes. Other times, only seconds later and without changing anything, the code is deemed okay!

The problem may (OR MAY NOT!) go away if one adds an arbitrary, non-functional statement below the "while":

myarray=[ „ABC“, „XTS“, „TMV“ ]
mysearchchar="B"
myindex=0
while (myindex < len(myarray[0])) {
  log("Hello World, this tiny change may make things work")
  if (mysearchchar == myarray[0].substr(myindex,1)) {
    mychar = myarray[17].substr(myindex,1)
              break
  }           
  myindex += 1
}

The problem may (OR MAY NOT!) come back if you edit/change/add anything, anywhere, in your Icinga2 configuration (inside the function, outside the function, in the same file or some other file).

The function uses the same construct four times. Using "icinga2 daemon -C" the problem is reported more or less often, and in different places.

So if the "while" statements are at line 100, 150, 200 and 250 subsequent runs (without changing anything!) of "icinga2 daemon -C" may give you a pattern like:

Error at 100 and 200 Error at 150 Error at 200 Error at 150 and 200

Then you insert the log("Hello World...") at 101 and suddenly there is no error anymore! Until you edit something else somewhere, maybe, or maybe not. It’s pretty nuts…

The problem is persistent between servers (freshly set up Icinga2) as long as you copy the whole configuration. I noticed it first with version 2.10.5, but migrating to 2.11-RC1 did not help.

I was unable to establish an overlaying pattern for this, it seems totally random. I have 35 years of IT experience and never seen anything quite like this.

For now I'm not looking for a "quick fix", partly because I CAN NOT give you my config-files, they are company confidential, partly because the issue is probably very complex and hard to find. I just would like to ask a few questions:

For now I have disabled the function in question, but I fear that I (and others) may run into the same issue when using substr() on a string that is part of an array, this is why I've opened this bug-report.

To Reproduce

Sorry, no way to reproduce the issue without my config, and I can't give you that.

Expected behavior

Either an error exists or it doesn't, it can't depend on the time of day.

Screenshots

---start--- [2019-08-06 10:40:44 +0200] critical/config: Error: Error while evaluating expression: String index is out of range Location: in /etc/icinga2/zones.d/global-templates/functions.conf: 265:23-265:54 [2019-08-06 10:40:44 +0200] critical/config: Error: Error while evaluating expression: String index is out of range Location: in /etc/icinga2/zones.d/global-templates/definitionen.conf: 259:16-259:46 [2019-08-06 10:40:44 +0200] critical/config: 2 errors ---rerun of "icinga2 daemon -C"--- [2019-08-06 10:40:52 +0200] critical/config: Error: Error while evaluating expression: String index is out of range Location: in /etc/icinga2/zones.d/global-templates/functions.conf: 275:23-275:54 [2019-08-06 10:40:52 +0200] critical/config: Error: Error while evaluating expression: String index is out of range Location: in /etc/icinga2/zones.d/global-templates/functions.conf: 265:23-265:54 [2019-08-06 10:40:52 +0200] critical/config: 2 errors ---rerun of "icinga2 daemon -C"--- [2019-08-06 10:40:54 +0200] critical/config: Error: Error while evaluating expression: String index is out of range Location: in /etc/icinga2/zones.d/global-templates/functions.conf: 275:23-275:54 [2019-08-06 10:40:54 +0200] critical/config: 1 error ---end---

Your Environment

Include as many relevant details about the environment you experienced the problem in

Al2Klimov commented 5 years ago

Hello @acid-eater and thank you for reporting!

are the strings in your first line of config really enclosed with unicode quotes instead of ASCII ones?

[2019-08-23 11:34:47 +0000] critical/config: Error: syntax error, unexpected $end, expecting ']'
Location: in /etc/icinga2/conf.d/7441.conf: 1:10-1:10
/etc/icinga2/conf.d/7441.conf(1): myarray=[ „ABC“, „XTS“, „TMV“ ]
                                           ^

Not an improvement suggestion, just a question. 😉

Best, AK

acid-eater commented 5 years ago

Hello Alexander,

thank you for pointing this out!

The code I quoted is not from the actual system, it has been sanitized. I did so using a Windows-based text editor that - at some point - changed the quotes to UTF8. In the real configuration files it's plain ASCII all around.

My apologies for the confusion!

Al2Klimov commented 5 years ago

And yet another question: Is your code really myarray = [ ] and not var myarray = [ ]?

If yes, then you're setting this.myarray and not a local variable called myarray. This might be on might be not the cause of your problem depending on what is this and what shares it.

What's ~you~ your this (hint: log(this))? Are you sure you want to set this.myarray?

acid-eater commented 5 years ago

myarray is actually "globals.myarray" and log(this) at one of the points of possible error gives:

[2019-08-23 13:56:16 +0200] information/config: Object of type 'Namespace'

Additional info: if the function is not called anywhere the problem does not seem to occur (I had the whole function disabled, enabled it right now, didn't see the "log(this)" output and fixed that by actually calling the function).

Right now everything seems to be fine (no errors), but I'm working on another issue in parallel and have unconfigured most of my hosts for that reason (we are not in production yet), so I'm not really surprised.

Al2Klimov commented 5 years ago
acid-eater commented 5 years ago
Al2Klimov commented 5 years ago

Are the other variables also global?

acid-eater commented 5 years ago

Uhh... good question. I did not declare them with "var", so they probably live in the "this" context, right?

Currently I'm re-adding my hosts one by one and expect to see the problem arise any moment now, maybe we should wait until this happens (Tuesday the latest), then we can see if changing variable types/scopes etc. makes a difference.

But: as I wrote in the report the problem initially happened when I was adding some code outside the actual functions that throw the errors. I was thinking about maximum file-length of config files, maximum number (or size) of objects - including functions, that kind of stuff. I eliminated the file-size thing at least, the problem also happened after I split my "functions.conf" into seven parts.

It's really weird, and I've been scratching my head since July before I opened this issue here.

Al2Klimov commented 5 years ago

Once your problem re-occurs, please give local variables a try where you really want to have ones.

acid-eater commented 5 years ago

Will do, thanks for the help so far!

acid-eater commented 5 years ago

New developments and SOLUTION:

I didn't have to re-enable more hosts, I just needed to call the function more often. It is now called by two hosts (i.e. host objects) and the result is this:

Run #1: icinga2 daemon -C [2019-08-26 08:54:39 +0200] critical/config: Error: Error while evaluating expression: String index is out of range Run #2: icinga2 daemon -C [2019-08-26 08:54:41 +0200] information/cli: Finished validating the configuration file(s).

Two seconds later, no changes, different result!

Changing of the string array from global and outside of functions to local and inside of functions: 5 times the error, then suddenly it's okay again.

Changing the string array to be a constant gives this warning:

[2019-08-26 09:03:49 +0200] warning/Value for constant 'myarray' was modified. This behaviour is deprecated.

But checking and re-checking my code I can't find any instance where I access the array or the strings inside except for reading from them. Anyway, the basic behavior didn't change: 4 times the error, then suddenly it's okay again.

But changing the other variables totally fixed the issue!!!

I explicitly declared all indexes, counters, characters and other variables to be "var". I even moved the array back where I wanted it (non-const, to avoid the warning, and global outside of the functions). No more errors!

I would be a very happy Icinga2 user now if it weren't for my curiosity: what may have caused this? I would therefor declare this problem as "solved", but would leave it to the developers to finally close it.

Al2Klimov commented 5 years ago

Please could you provide a config file which I can add to a fresh single-node Icinga 2 into conf.d to reproduce the problem?

acid-eater commented 5 years ago

I am sorry, but as mentioned in the initial report I can't give you "live" config files, and my attempts to replicate the problem using "sanitized" code didn't work. The issue is so flaky it will probably never happen again, or in some totally unrelated context.

Something seems to be odd with how the config-checker handles variables that are not explicitly defined but left in the "this" space (as far as I understood you). Icinga2 itself doesn't have a problem with it (once past the check it will work normally, the functions in question perform well and produce the results expected).

Thinking about this a bit more I can't come up with a way for anybody to find the underlying cause without my code, which I can't reveal. Therefor there is no need to keep this issue open.

Thank you VERY MUCH for your help, Alexander, you pointed the finger at the problem right away, and I will from now on declare my variables explicitly in the hope of avoiding more problems.

acid-eater commented 5 years ago

Hi again,

I believe I found the root-cause of the problem. My variable "myarray" above contained a string that contained the UTF-8 character "§" (which is part of every German keyboard). It seems that Icinga2 misinterprets this character when determining the length of a string, but only after accessing it, and not every time.

Example: Let's say the string is "According to §123 every German must own a garden gnome" (don't worry, there is no such law :-) ). Now you search through the string for "o". You hit it before you hit "§", and everything is fine. But then you search for "G" and run across the "§". From that point on the string may or may not be considered one character longer than before (if I counted this right).

If you run across the "§" several times the chance of the error becomes greater (I have no clue as to why). During my tests I found patterns like (R=right, W=wrong): "RWWRRWRWWWWW".

Furthermore a "log" of the above string comes out correct, but a loop that takes the string apart and prints every single character produces an unprintable character ("▒") when logging the character that was extracted with substr().

This is not big problem for us now that we know the cause, we will simply avoid UTF-8, but somebody might want to look into it. Maybe something is broken with strings entirely, or with the substr() function. Hope this helps!

Al2Klimov commented 5 years ago

Hello again!

Please could you test either the snapshot packages or v2.11 (once done)?

I've added an implicit-global-variable detector for you: #7449

Best, AK

acid-eater commented 5 years ago

Hello,

I am currently running 2.11RC1. According to my "apt" there are no never versions in your repository "icinga-bionic-testing" than v2.11.0-rc1-56-gfcca9643b. Could you please point me to the snapshot-packages you were referring to?

acid-eater commented 5 years ago

PS: sorry for the delay, the notification mails from github were eaten by our spamfilter :-)

Al2Klimov commented 5 years ago

https://packages.icinga.com/ubuntu/

dnsmichi commented 5 years ago

Here's the instructions for adding the snapshot repository: https://icinga.com/docs/icinga2/snapshot/doc/21-development/#ubuntu

acid-eater commented 5 years ago

Thanks a lot, but now I got an organizational problem. I'm currently developing/deploying/installing an Icinga2-environment for a customer, and my boss would rather have me finished this first before we can do any more testing. This might take several months.

Using snapshots during that time could cause us a lot of trouble since we do pretty "kinky" stuff with Icinga2 (but we did migrate to 2.11RC1 because we assume that the new network-code will be the future and need to be aware of limitations and possibilities). Since we found a workaround for the original issue by simply avoiding UTF-8 characters there is no need for us (specifically) to continue testing, sorry.

Maybe one day someone will go over the substr() function and/or the string handling in general with respect to UTF-8, and then the issue might get fixed for good. For now, and for us, it's sufficient to know how to avoid it, and re-opening the issue five days ago was just to let people know what I've found out.

I am therefor now closing the issue again.

dnsmichi commented 5 years ago

Ok, fair enough. There's more important things to implement at the moment, for both sides.