bottlerocket-os / bottlerocket

An operating system designed for hosting containers
https://bottlerocket.dev
Other
8.59k stars 506 forks source link

Issue with allow-unsafe-sysctls kubernetes setting #3671

Closed s-marinkovic closed 7 months ago

s-marinkovic commented 9 months ago

*Image I'm using:ami-0e17e88504aa25087*

**What I expected to happen:We want to add somaxconn unsafe sysctl using following setting:

 [settings.kubernetes]
  allowed-unsafe-sysctls = ["net.core.somaxconn"]

as described here: https://github.com/bottlerocket-os/bottlerocket/pull/1388#:~:text=%5Bsettings.kubernetes%5D%0Aallowed%2Dunsafe%2Dsysctls%20%3D%20%5B%22net.core.somaxconn%22%2C%20...%5D

**What actually happened: Bottlerocket failed to start with error:

Starting Bottlerocket userdata configuration system... [ 2.376063] early-boot-config[1038]: Provider error: Unable to serialize settings from instance user data: Error parsing TOML user data: redefinition of table settings.kubernetes for key settings.kubernetes at line 10 column 1 [FAILED] Failed to start Bottlerocket userdata configuration system.

**How to reproduce the problem: We are using terraform-aws-modules/eks/aws module with eks_managed_node_group and all settings are defined in bootstrap_extra_args and for example we have tried to add this:

[settings.kubernetes.eviction-hard] "memory.available" = "15%"

and it works as expected, but allowed-unsafe-sysctls doesn't work even it's added within same PR

foersleo commented 9 months ago

Thank your reaching out @s-marinkovic.

The error hints at your userdata toml is violating the toml format by defining the table [settings.kubernetes] multiple times. If early-boot-config can not parse the user-data it can not successfully setup the host.

Could you share the full userdata you used when running into this issue or check for multiple definitions of that table?

s-marinkovic commented 9 months ago

Hi @foersleo thanks for quick response. Here is full userdata we have before adding:

bootstrap_extra_args = <<-EOT [settings.kernel] lockdown = "integrity" [settings.host-containers.admin] enabled = true superpowered = true [settings.kubernetes.node-labels] type = "app-node" datadog = "enabled" EOT

and here is with allowed-unsafe-sysctls which fails with error mentioned above:

bootstrap_extra_args = <<-EOT [settings.kernel] lockdown = "integrity" [settings.host-containers.admin] enabled = true superpowered = true [settings.kubernetes.node-labels] type = "app-node" datadog = "enabled" [settings.kubernetes] allowed-unsafe-sysctls = ["net.core.somaxconn"] EOT

and this one for example went ok without errors:

bootstrap_extra_args = <<-EOT [settings.kernel] lockdown = "integrity" [settings.host-containers.admin] enabled = true superpowered = true [settings.kubernetes.node-labels] type = "app-node" datadog = "enabled" [settings.kubernetes.eviction-hard] "memory.available" = "15%" EOT

foersleo commented 9 months ago

Thanks for the data. I will have a look at how the interaction of terraform-aws-modules/eks is here. My hypothesis would be that the bootstrap_extra_args somehow conflicts with the template in that it does just add bootstrap_extra_args to a template that already specifies a table with name [settings.kubernetes]. The toml parser does not like that.

Looking at the terraform-aws-modules template for Bottlerocket, my hypothesis is, that you are also specifying enable_bootstrap_user_data, which puts a table [settings.kubernetes] into the bootstrap data.

If that is the case I am not quite sure what would be the right way to do what you are trying to achieve. Options I would see from a technical perspective (I have not tried any of these):

  1. Add the line allowed-unsafe-sysctls = ["net.core.somaxconn"] as the first line in your bootstrap_extra_args, without the table header [settings.kubernetes], assuming it will just be added to the existing table from the template.
  2. Do not set enable_bootstrap_user_data, and add the relevant data from the template in your bootstrap_extra_args.

I would say option 2 would be the cleaner solution of these. But I am not sure if there is a third, better solution, so maybe someone else can speak up and teach how to do this right.

s-marinkovic commented 9 months ago

@foersleo thanks for update, yes we are using enable_bootstrap_user_data and i see what you mean. I will check this and see which option would be best for us and give update here if it works.

foersleo commented 8 months ago

Hey @s-marinkovic, were you able to confirm or disprove my theory on this?

Can this be closed or is there still something that we can help with?

foersleo commented 7 months ago

Issue seems to be stale and there is no new information. Closing for now. Please feel free to reopen or open a new issue if there is more left to be done around this.