flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
679 stars 29 forks source link

GROUP is overwritten after upgrade and reboot on update.conf #413

Open JesusRo opened 3 years ago

JesusRo commented 3 years ago

Description

Custom GROUP configured on /etc/flatcar/update.conf is reset to stable (the channel of the group) after an upgrade.

Impact

When vms are updated, reconfiguration of the group on /etc/flatcar/update.conf is needed

Environment and steps to reproduce

  1. Set-up: VMs deployed on Openstack using cloud-init update.conf manually configured afterwards Nebraska instance deployed via helm chart

  2. Task: Updating node

  3. Action(s): Create group on Nebraska:

          "name":"My-dev.stable",
          "track":"My-dev.stable",
          "description":"My dev stable Cluster vms",
          "policy_updates_enabled":true,
          "policy_safe_mode":true,
          "policy_max_updates_per_period":999999,
          "policy_period_interval":"1 hours",
          "policy_update_timeout":"60 minutes",
          "channel_id":"e06064ad-4414-4904-9a6e-fd465593d1b2",
          "policy_timezone":"Europe/Berlin",
          "application_id":"e96281a6-d1af-4bde-9a0a-97b76e56dc57"

    Logging into node and update /etc/flatcar/update.conf

        GROUP=My-dev.stable
        SERVER=https://mylocal-nebraska.local/v1/update/
        MACHINE_ALIAS=flatcar-test-1.local

    Run:

    • systemctl restart update-engine
    • /usr/bin/update_engine_client -reset-status
    • /usr/bin/update_engine_client -check_for_update when /usr/bin/update_engine_client -status says we are ready for reboot, reboot the node
  4. Error: Logging back to the node, /etc/flatcar/update.conf has GROUP changed to stable and REBOOT_STRATEGY is added as below

        GROUP=stable
        SERVER=https://mylocal-nebraska.local/v1/update/
        MACHINE_ALIAS=flatcar-test-1.local
        REBOOT_STRATEGY=off

    Expected behavior

    Logging back to the node, /etc/flatcar/update.conf should have been not modified

        GROUP=My-dev.stable
        SERVER=https://mylocal-nebraska.local/v1/update/
        MACHINE_ALIAS=flatcar-test-1.local

Additional information

Followed https://kinvolk.io/docs/nebraska/latest/managing-updates/#existing-machines for setting up updates.conf file

Actually I could reproduced always via reseting the release as shown below

flatcar-test-2 ~ # cat /etc/flatcar/update.conf 
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
flatcar-test-2 ~ # systemctl restart update-engine
flatcar-test-2 ~ # update_engine_client -reset_status
I0610 11:16:50.386335 17409 update_engine_client.cc:223] Setting Update Engine status to idle ...
I0610 11:16:50.388559 17409 update_engine_client.cc:229] ResetStatus succeeded; to undo partition table changes run:
(D=$(rootdev -d) P=$(rootdev -s); cgpt p -i$(($(echo ${P#$D} | sed 's/^[^0-9]*//')-1)) $D;)
flatcar-test-2 ~ # update_engine_client -update
I0610 11:16:53.010391 17454 update_engine_client.cc:247] Initiating update check and install.
I0610 11:16:53.015081 17454 update_engine_client.cc:252] Waiting for update to complete.
LAST_CHECKED_TIME=1623323813
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_UPDATE_AVAILABLE
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.030048
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.090120
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.160172
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.220256
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.320365
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.440405
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.510476
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.640641
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.810836
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.910956
CURRENT_OP=UPDATE_STATUS_DOWNLOADING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_FINALIZING
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
LAST_CHECKED_TIME=1623323813
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_UPDATED_NEED_REBOOT
NEW_VERSION=2765.2.5
NEW_SIZE=541498811
I0610 11:17:58.329700 17454 update_engine_client.cc:194] Update succeeded -- reboot needed.
flatcar-test-2 ~ # reboot
Connection to flatcar-test-2.local closed by remote host.
Connection to flatcar-test-2.local closed.

ssh to flatcar-test-2.local again

[...]
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Last login: Thu Jun 10 11:10:06 UTC 2021 from 10.123.44.53 on pts/0
Flatcar Container Linux by Kinvolk stable (2765.2.5)
Update Strategy: No Reboots
core@flatcar-test-2 ~ $ cat /etc/flatcar/update.conf 
GROUP=stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
REBOOT_STRATEGY=off
pothos commented 3 years ago

Hi, can you share your cloud-config userdata?

pothos commented 3 years ago

I understood that REBOOT_STRATEGY stays part of the cloud-config userdata. It is expected for it to be written to the file again because the cloud-config data gets processed on every boot.

JesusRo commented 3 years ago

Hi,

I did have nothing on user_data related to this (my goal is actually to provision the update config afterwards), and I hardly think it comes from there as this only happens after an update. If I just reboot the vm, there is no issue, the update.conf file is ok

core@flatcar-test-2 ~ $ cat /etc/flatcar/update.conf 
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
core@flatcar-test-2 ~ $ cat /usr/share/flatcar/release 
FLATCAR_RELEASE_VERSION=2765.2.3
FLATCAR_RELEASE_BOARD=amd64-usr
FLATCAR_RELEASE_APPID={e96281a6-d1af-4bde-9a0a-97b76e56dc57}
core@flatcar-test-2 ~ $ sudo reboot
Connection to flatcar-test-2.local closed by remote host.
Connection to flatcar-test-2.local closed.
 ✘  ~  ssh flatcar-test-2.local
Last login: Thu Jun 17 06:43:03 UTC 2021 from 10.123.44.53 on pts/0
Flatcar Container Linux by Kinvolk flatcar.lttwdev (2765.2.3)
core@flatcar-test-2 ~ $ cat /etc/flatcar/update.conf 
GROUP=My-dev.stable
SERVER=https://mylocal-nebraska.local/v1/update/
MACHINE_ALIAS=flatcar-test-2.local
core@flatcar-test-2 ~ $ cat /usr/share/flatcar/release 
FLATCAR_RELEASE_VERSION=2765.2.3
FLATCAR_RELEASE_BOARD=amd64-usr
FLATCAR_RELEASE_APPID={e96281a6-d1af-4bde-9a0a-97b76e56dc57}
core@flatcar-test-2 ~ $ logout

user-data:

#cloud-config
write_files:
  - path: /etc/systemd/network/80-app.network
    owner: "root:root"
    permissions: "0644"
    content: |
      [Network]
      DHCP=yes

      [DHCP]
      UseMTU=true
      UseDomains=false

      [Match]
      Name=eth0
  - path: /etc/systemd/network/90-storage.network
    owner: "root:root"
    permissions: "0644"
    content: |
      [Network]
      DHCP=yes

      [DHCP]
      UseMTU=true
      UseDomains=false
      UseRoutes=false

      [Match]
      Name=eth*
  - path: /etc/systemd/network/zz-default.network
    owner: "root:root"
    permissions: "0644"
    content: |
      [Network]
      DHCP=yes

      [DHCP]
      UseMTU=true
      UseDomains=false

      [Match]
      Name=*

Thanks for taking a look

jepio commented 2 years ago

Hi, is this still happening with a recent release?

pothos commented 2 years ago

I suggest modifying oem-cloudinit.service from ExecStart=/usr/bin/coreos-cloudinit to ExecStart=/usr/bin/strace -f /usr/bin/coreos-cloudinit and share the unit log. I still think there is some logical bug around https://github.com/flatcar-linux/coreos-cloudinit/blob/cfcc44197d11f44441e5aa2c9db34bcd0bf16015/system/update.go#L58 but if coreos-cloudinit is not writing the file we can search elsewhere.