coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

Add Information about update_engine_scheduler #1544

Closed pizzarabe closed 5 years ago

pizzarabe commented 7 years ago

My CoreOS 1068.9.0 is unable to find a newer version of CoreOS if he is using the update_check_scheduler.

If I start the update manually with update_engine_client -check_for_update the updater is able to find a newer version and is updating like expected.

$ journalctl -u update-engine -u locksmithd

Aug 29 14:22:58 localhost locksmithd[1130]: LastCheckedTime=0 Progress=0 CurrentOperation="UPDATE_STATUS_IDLE" NewVersion=0.0.0.0 NewSize=0
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:prefs.cc(51)] certificate-report-to-send-update not present in /var/lib/update_engine/prefs
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:prefs.cc(51)] certificate-report-to-send-download not present in /var/lib/update_engine/prefs
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:omaha_request_params.cc(60)] Current group set to stable
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:update_attempter.cc(475)] Already updated boot flags. Skipping.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:update_attempter.cc(658)] Scheduling an action processor start.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:action_processor.cc(36)] ActionProcessor::StartProcessing: OmahaRequestAction
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:omaha_request_action.cc(257)] Posting an Omaha request to https://public.update.core-os.net/v1/update/
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:omaha_request_action.cc(258)] Request: <?xml version="1.0" encoding="UTF-8"?>
Aug 29 15:08:04 localhost update_engine[838]: <request protocol="3.0" version="CoreOSUpdateEngine-0.1.0.0" updaterversion="CoreOSUpdateEngine-0.1.0.0" installsource="scheduler" ismachine="1">
Aug 29 15:08:04 localhost update_engine[838]:     <os version="Chateau" platform="CoreOS" sp="1068.9.0_x86_64"></os>
Aug 29 15:08:04 localhost update_engine[838]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="1068.9.0" track="stable" bootid="{65ca6cd1-8b98-4b90-9b5d-63a4d5035e6f}" oem="" oemversion="" alephversion="1068.9.0" machineid="26875563fc4949518c4c39c6ee801d9b" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Aug 29 15:08:04 localhost update_engine[838]:         <ping active="1"></ping>
Aug 29 15:08:04 localhost update_engine[838]:         <updatecheck></updatecheck>
Aug 29 15:08:04 localhost update_engine[838]:         <event eventtype="3" eventresult="2" previousversion=""></event>
Aug 29 15:08:04 localhost update_engine[838]:     </app>
Aug 29 15:08:04 localhost update_engine[838]: </request>
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:libcurl_http_fetcher.cc(48)] Starting/Resuming transfer
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:libcurl_http_fetcher.cc(164)] Setting up curl options for HTTPS
Aug 29 15:08:04 localhost locksmithd[1130]: LastCheckedTime=0 Progress=0 CurrentOperation="UPDATE_STATUS_CHECKING_FOR_UPDATE" NewVersion=0.0.0.0 NewSize=0
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:libcurl_http_fetcher.cc(427)] Setting up timeout source: 1 seconds.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:libcurl_http_fetcher.cc(240)] HTTP response code: 200
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:libcurl_http_fetcher.cc(297)] Transfer completed (200), 267 bytes downloaded
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:omaha_request_action.cc(574)] Omaha request response: <?xml version="1.0" encoding="UTF-8"?>
Aug 29 15:08:04 localhost update_engine[838]: <response protocol="3.0" server="update.core-os.net">
Aug 29 15:08:04 localhost update_engine[838]:  <daystart elapsed_seconds="0"></daystart>
Aug 29 15:08:04 localhost update_engine[838]:  <app appid="e96281a6-d1af-4bde-9a0a-97b76e56dc57" status="ok">
Aug 29 15:08:04 localhost update_engine[838]:   <updatecheck status="noupdate"></updatecheck>
Aug 29 15:08:04 localhost update_engine[838]:  </app>
Aug 29 15:08:04 localhost update_engine[838]: </response>
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:omaha_request_action.cc(394)] No update.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:action_processor.cc(82)] ActionProcessor::ActionComplete: finished OmahaRequestAction, starting OmahaResponseHandlerAction
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:omaha_response_handler_action.cc(36)] There are no updates. Aborting.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:action_processor.cc(68)] ActionProcessor::ActionComplete: OmahaResponseHandlerAction action failed. Aborting processing.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:action_processor.cc(73)] ActionProcessor::ActionComplete: finished last action of type OmahaResponseHandlerAction
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:update_attempter.cc(283)] Processing Done.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:update_attempter.cc(319)] No update.
Aug 29 15:08:04 localhost update_engine[838]: [0829/130804:INFO:update_check_scheduler.cc(82)] Next update check in 40m18s
Aug 29 15:08:04 localhost locksmithd[1130]: LastCheckedTime=0 Progress=0 CurrentOperation="UPDATE_STATUS_IDLE" NewVersion=0.0.0.0 NewSize=0
$ cat /usr/share/coreos/update.conf 
SERVER=https://public.update.core-os.net/v1/update/
GROUP=stable
$ cat /etc/os-release
NAME=CoreOS
ID=coreos
VERSION=1068.9.0
VERSION_ID=1068.9.0
BUILD_ID=2016-08-05-0455
PRETTY_NAME="CoreOS 1068.9.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

The used cloud-init was:

#cloud-config

coreos:
  units:
    - name: coreos-bootstrap.service
      runtime: no
      command: start
      content: |
        [Unit]
        Description=Install coreos to disk
        [Service]
        ExecStart=/bin/bash -c "/usr/bin/coreos-install -C  stable -d /dev/sda -c /home/core/cloud-config.yml  && wget -q -O /dev/null --no-check-certificate http://10.0.0.100 && reboot"
        [X-Fleet]
        X-Conflicts=coreos-bootstrap.service
users:
  - name: core
    passwd: PASSWORD
write_files:
  - content: |
     #cloud-config
      ssh_authorized_keys:
        - "KEY"
        - "KEY"
      coreos:
        update:
          reboot-strategy: "etcd-lock"

        locksmith:
          endpoint: endpoint: http://10.0.0.110:2379
          group: "worker"

        etcd2:
          proxy: on
          listen-client-urls: http://127.0.0.1:2379
          initial-cluster: etcdserver=http://10.0.0.102:2380
        fleet:
          etcd_servers: "http://localhost:2379"
        units:
          - name: etcd2.service
            command: start
          - name: fleet.service
            command: start

        units:
          - name: set-timezone-berlin.service
            command: start
            content: |
              [Unit]
              Description=Set Timezone to Europe/Berlin
              [Service]
              Type=oneshot
              ExecStart=/usr/bin/timedatectl set-timezone Europe/Berlin
              RemainAfterExit=yes

    path: /home/core/cloud-config.yml
    permissions: '0600'
    owner: core:core

Environment

ISO installation.

Expected Behavior

The Update_Engine should find a newer version, if available

Environment

VMWare

pizzarabe commented 7 years ago

According to @robszumski there is a rate limit if you are downloading newer images with update_check_scheduler. I guess we should add this to the documentation. And I would love to know what this rate_limit is.

robszumski commented 7 years ago

@pizzarabe Where do you think it makes sense to add this documentation? My worry is that it is a bit complicated to explain, as the rate limit can change based on certain factors, etc.

pizzarabe commented 7 years ago

The Documentation already mention the rate limit but without a explanation:

[...]you can force an update check, which will skip any rate-limiting settings that are configured in CoreUpdate.

Maybe we can add the explanation on this page (or link there to the explanation page)?

As a sysadmin, trying to build a coreos infrastructure with the information provided from the docs, this is a nontransparent behavior (I thought I had a mistake in my cloud-config file).

Like i said, i don't know about the rate_limit, but if this feature is difficult to explain, we should definitive document it because understanding this feature without any documentation is even more difficult :) .

crawford commented 7 years ago

Should probably just leave a link to CoreUpdate. Basically, users should not depend on the scheduling of updates from the public update servers. The public update servers are catering to the global population of CoreOS Linux instances, which is generally more conservative than any one user/deployment. If more control over updates is needed, it's necessary to run your own update service.

bgilbert commented 5 years ago

Thank you for reporting this issue. Unfortunately, we don't think we'll end up addressing it in Container Linux.

We're now working on Fedora CoreOS, the successor to Container Linux, and we expect most major development to occur there instead. Meanwhile, Container Linux will be fully maintained into 2020 but won't see many new features. We appreciate your taking the time to report this issue and we're sorry that we won't be able to address it.