Icinga / icingadb

Icinga configuration and state database supporting multiple environments
https://icinga.com
GNU General Public License v2.0
57 stars 21 forks source link

When using a very large argument key icingadb will see a fatal error and quit #791

Open adamparker opened 4 weeks ago

adamparker commented 4 weeks ago

Describe the bug

I am uncertain as to whether this is a bug or something that needs more warnings in Icinga.

I was running a check command with a java library that was loaded like this:

java -cp "lib/*" java.class.that.was.used.as.an.argument.that.was.eighty.seven.characters.long.and.broke ....omitted

The class parameter has no flag/option/argument prior, so this may be part of the issue.

In the command template I put this as an argument which all worked fine from the check perspective.

However icingadb didn't like it:

icinga2_icingadb  | 2024-08-15T02:23:41.983Z    WARN    database    Can't execute query. Retrying   {"error": "can't perform \"INSERT INTO \\\"checkcommand_argument\\\" (\\\"argument_key\\\", \\\"description\\\", \\\"set_if\\\", \\\"checkcommand_id\\\", \\\"argument_value\\\", \\\"argument_order\\\", \\\"environment_id\\\", \\\"id\\\", \\\"repeat_key\\\", \\\"required\\\", \\\"skip_key\\\", \\\"properties_checksum\\\", \\\"argument_key_override\\\", \\\"separator\\\") VALUES (:argument_key, :description, :set_if, :checkcommand_id, :argument_value, :argument_order, :environment_id, :id, :repeat_key, :required, :skip_key, :properties_checksum, :argument_key_override, :separator)\": Error 1406 (22001): Data too long for column 'argument_key' at row 10", "errorVerbose": "Error 1406 (22001): Data too long for column 'argument_key' at row 10\ncan't perform \"INSERT INTO \\\"checkcommand_argument\\\" (\\\"argument_key\\\", \\\"description\\\", \\\"set_if\\\", \\\"checkcommand_id\\\", \\\"argument_value\\\", \\\"argument_order\\\", \\\"environment_id\\\", \\\"id\\\", \\\"repeat_key\\\", \\\"required\\\", \\\"skip_key\\\", \\\"properties_checksum\\\", \\\"argument_key_override\\\", \\\"separator\\\") VALUES (:argument_key, :description, :set_if, :checkcommand_id, :argument_value, :argument_order, :environment_id, :id, :repeat_key, :required, :skip_key, :properties_checksum, :argument_key_override, :separator)\"\ngithub.com/icinga/icingadb/internal.CantPerformQuery\n\t/icingadb-src/internal/internal.go:30\ngithub.com/icinga/icingadb/pkg/icingadb.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2.1\n\t/icingadb-src/pkg/icingadb/db.go:412\ngithub.com/icinga/icingadb/pkg/retry.WithBackoff\n\t/icingadb-src/pkg/retry/retry.go:60\ngithub.com/icinga/icingadb/pkg/icingadb.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2\n\t/icingadb-src/pkg/icingadb/db.go:407\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"}
icinga2_icingadb  | 2024-08-15T02:23:41.988Z    INFO    icingadb    Starting state runtime updates sync
icinga2_icingadb  | 2024-08-15T02:23:41.988Z    INFO    config-sync Finished initial state sync in 40.420509ms
icinga2_icingadb  | 2024-08-15T02:24:21.990Z    INFO    runtime-updates Upserted 1 ServiceState items
icinga2_icingadb  | 2024-08-15T02:24:31.832Z    INFO    history-sync    Synced 1 state history items
icinga2_icingadb  | 2024-08-15T02:28:41.983Z    WARN    config-sync Aborted config sync after 5m0.035728203s
icinga2_icingadb  | 2024-08-15T02:28:42.737Z    FATAL   icingadb    Error 1406 (22001): Data too long for column 'argument_key' at row 10
icinga2_icingadb  | can't perform "INSERT INTO \"checkcommand_argument\" (\"argument_key\", \"description\", \"set_if\", \"checkcommand_id\", \"argument_value\", \"argument_order\", \"environment_id\", \"id\", \"repeat_key\", \"required\", \"skip_key\", \"properties_checksum\", \"argument_key_override\", \"separator\") VALUES (:argument_key, :description, :set_if, :checkcommand_id, :argument_value, :argument_order, :environment_id, :id, :repeat_key, :required, :skip_key, :properties_checksum, :argument_key_override, :separator)"
icinga2_icingadb  | github.com/icinga/icingadb/internal.CantPerformQuery
icinga2_icingadb  |     /icingadb-src/internal/internal.go:30
icinga2_icingadb  | github.com/icinga/icingadb/pkg/icingadb.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2.1
icinga2_icingadb  |     /icingadb-src/pkg/icingadb/db.go:412
icinga2_icingadb  | github.com/icinga/icingadb/pkg/retry.WithBackoff
icinga2_icingadb  |     /icingadb-src/pkg/retry/retry.go:60
icinga2_icingadb  | github.com/icinga/icingadb/pkg/icingadb.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
icinga2_icingadb  |     /icingadb-src/pkg/icingadb/db.go:407
icinga2_icingadb  | golang.org/x/sync/errgroup.(*Group).Go.func1
icinga2_icingadb  |     /go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78
icinga2_icingadb  | runtime.goexit
icinga2_icingadb  |     /usr/local/go/src/runtime/asm_amd64.s:1695
icinga2_icingadb  | retry deadline exceeded
icinga2_icingadb  | github.com/icinga/icingadb/pkg/retry.WithBackoff
icinga2_icingadb  |     /icingadb-src/pkg/retry/retry.go:95
icinga2_icingadb  | github.com/icinga/icingadb/pkg/icingadb.(*DB).NamedBulkExec.func1.(*DB).NamedBulkExec.func1.1.2
icinga2_icingadb  |     /icingadb-src/pkg/icingadb/db.go:407
icinga2_icingadb  | golang.org/x/sync/errgroup.(*Group).Go.func1
icinga2_icingadb  |     /go/pkg/mod/golang.org/x/sync@v0.7.0/errgroup/errgroup.go:78
icinga2_icingadb  | runtime.goexit
icinga2_icingadb  |     /usr/local/go/src/runtime/asm_amd64.s:1695
 arguments += {
    "-cp" = {
      value = CustomPluginDir + "/lib/*"
      order = "-4"
    }
    "java.class.that.was.used.as.an.argument.that.was.eighty.seven.characters.long.and.broke" = {
      order = "-3"
    }
    .... omitted
}

Expected behavior

A warning that argument keys have a limit, or this is a bug.

I have resolved this by using a bash script wrapper. What advice would you give for anyone else who has a similar issue?

Your Environment

Include as many relevant details about the environment you experienced the problem in

Using the following docker images: "icinga/icingaweb2:2.12.1" "icinga/icinga2:2.14.2" "icinga/icingadb:1.2.0" "redis:7.2.5"

Additional context

I ran this in a docker environment with MySQL.

oxzi commented 4 weeks ago

Thanks for using Icinga DB and reporting your crash.

It seems like the issue roots in your long argument key. The current schema defines: https://github.com/Icinga/icingadb/blob/27d27d45396a249cfb75fe1993d6360540e2cc0d/schema/pgsql/schema.sql#L838

However, your key extends 64 chars:

>>> len("java.class.that.was.used.as.an.argument.that.was.eighty.seven.characters.long.and.broke")
87

Edit: My brain shut down after reading the first parts of your java class path.. My char counting was not necessary. Sorry :)

adamparker commented 4 weeks ago

Thanks for responding and highlighting the issue. Our brains all shut down reading java classes because we're so conditioned to its verbose-ness.

What would be the suggested workaround for argumentless parameters that are over 64 chars long?

I feel my bash script approach is a bit flakey and was curious if there was a better solution.

oxzi commented 4 weeks ago

First, I will compare how the "old" Icinga 2 IDO handled this. Then I would try to find out why it was limited to 64 chars and look for any platform specific limitations. If there are none, I would increase the column size. Please allow me some time to look at this.

In the meantime, I would suggest that you continue with your mitigation of the wrapper script. I would advise against manually altering the schema on your end, as this could have unintended side effects and break future schema upgrades.

oxzi commented 4 weeks ago

Icinga 2

I have integrated your configuration in my test setup and was directly able to reproduce the Icinga DB crash. However, Icinga 2's IDO continued to work.

object CheckCommand "icingadb-i791" {
  import "plugin-check-command"
  command = [ "/bin/true" ]
  arguments = {
    "java.class.that.was.used.as.an.argument.that.was.eighty.seven.characters.long.and.broke" = {
      value = "F"
    }
  }
}

apply Service "crash icingadb i791" {
  import "generic-service"
  check_command = "icingadb-i791"
  assign where host.address
}

As I was unable to find a direct match for Icinga DB's eventcommand_argument table in the IDO schema, I just dumped the IDO database. And, as it turns out, the classpath-like argument appeared nowhere. It is not stored in the IDO database at all.

Then there was Icinga/icinga2#9887, which has some similarity to this issue. There, the argument length was made to fit the length of the platform, which I will look at next.

Technical Limitations

As always, the limits are platform- and architecture-specific. tl;dr: The reviewed operating systems have limits far beyond 64 characters.

Based on this very detailed Stack Exchange post, on a non-ancient Linux system, both ARG_MAX and MAX_ARG_STRLEN limit the parameter length for strings being passed to execve. It appears that there has been no change in the ten years between that post and today, resulting in a length of 131072 chars.

$ uname -srvm
Linux 6.10.4 #1-NixOS SMP PREEMPT_DYNAMIC Sun Aug 11 10:58:04 UTC 2024 x86_64
$ getconf ARG_MAX
2097152
$ echo $(($(getconf PAGE_SIZE) * 32))
131072

Similar limits are being enforced for other Unix-like operating systems, as this document explains. For example, that's the limit I get on a current OpenBSD:

$ uname -srvm
OpenBSD 7.6 GENERIC#237 amd64
$ sysctl kern.argmax
kern.argmax=524288

While I am not very familiar with Windows, its documentation notes a limit of 8191 characters.

Icinga DB

As written above, the Icinga DB schema limits eventcommand_argument.argument_key as varchar(64). In the Go code, the struct field is a string.

https://github.com/Icinga/icingadb/blob/27d27d45396a249cfb75fe1993d6360540e2cc0d/schema/pgsql/schema.sql#L944 https://github.com/Icinga/icingadb/blob/7c068d4adf169181b4d68dcd6222a18b6990b583/pkg/icingadb/v1/command.go#L21

It seems like there is no usage from the Go end, but its available there mostly for Icinga DB Web.

However, it should be noted that there are multiple tables with an argument_key:

Then, there are is also the argument_key_override column and even the {check,event,notification}command_envvar tables, having potential identical issues.

Going back to the first schema, being introduced in 05d5e97dd5b73d0792e3416fd66b9257b752fd0c, this field was always from the type varchar(64).

As a first test, I have just altered the type from varchar(64) to text just for eventcommand_argument.argument_key, which resolved this specific error. I am going to create a PR addressing this and the similar issues outlined above.

oxzi commented 4 weeks ago

What would be the suggested workaround for argumentless parameters that are over 64 chars long?

I just had another idea in this regard. As being argument-less, you can revert the logic and make it key-less. To be precise, set skip_key for your CheckCommand Argument and set the value accordingly. Something like:

object CheckCommand "icingadb-i791" {
  import "plugin-check-command"
  command = [ "/bin/true" ]
  arguments = {
    // "java.class.that.was.used.as.an.argument.that.was.eighty.seven.characters.long.and.broke" = {
    //   value = "oops."
    // }
    "(icingadb-i791)" = {
      skip_key = true
      value = "java.class.that.was.used.as.an.argument.that.was.eighty.seven.characters.long.and.broke"
    }
  }
}
adamparker commented 3 weeks ago

Hi,

I can confirm that the skip_key workaround works fine.

Thanks for your help.