flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
773 stars 32 forks source link

Migration from CoreOS to Flatcar not working #228

Closed CptDaniel closed 4 years ago

CptDaniel commented 4 years ago

Description

For some of our servers the migration from CoreOS to Flatcar is not working. Neither with the provided script nor by doing the steps manually. A reboot of the machine also does not fix this issue.

Impact

If this is not fixable we have to migrate the machine manually to a new flatcar server, which is quite tedious.

Expected behavior The server should migrate from CoreOS to Flatcar after rebooting.

Additional information Here the logs from journalctl if it helps:

I1106 15:31:29.139461  4037 main.cc:89] CoreOS Update Engine starting
update_engine[4037]: I1106 15:31:29.139621  4037 payload_state.cc:359] Current Response Signature =
update_engine[4037]: NumURLs = 1
update_engine[4037]: Url0 = https://update.release.core-os.net/amd64-usr/2512.3.0/update.gz
update_engine[4037]: Payload Size = 473835028
update_engine[4037]: Payload Sha256 Hash = UWcbzrUFjfyyH47erJ1mcAmkZ9aH96gZC4Hu4f/muUA=
update_engine[4037]: Is Delta Payload = 0
update_engine[4037]: Max Failure Count Per Url = 10
update_engine[4037]: Disable Payload Backoff = 1
update_engine[4037]: I1106 15:31:29.139845  4037 payload_state.cc:380] Payload Attempt Number = 2298
update_engine[4037]: I1106 15:31:29.139955  4037 payload_state.cc:403] Current URL Index = 0
update_engine[4037]: I1106 15:31:29.140027  4037 payload_state.cc:424] Current URL (Url0)'s Failure Count = 0
update_engine[4037]: I1106 15:31:29.140092  4037 payload_state.cc:451] Backoff Expiry Time = 01/01/70 00:00:00 UTC
systemd[1]: Started Update Engine.
update_engine[4037]: I1106 15:31:29.141785  4037 update_check_scheduler.cc:74] Next update check in 5m30s
update_engine[4037]: I1106 15:31:29.160707  4037 dbus_service.cc:57] Attempting interactive update
update_engine[4037]: I1106 15:31:29.160733  4037 update_attempter.cc:263] New update check requested
update_engine[4037]: I1106 15:31:29.161015  4037 omaha_request_params.cc:59] Current group set to stable
update_engine[4037]: I1106 15:31:29.161087  4037 update_attempter.cc:493] Updating boot flags...
update_engine[4037]: I1106 15:31:29.177549  4037 update_attempter.cc:626] Scheduling an action processor start.
update_engine[4037]: I1106 15:31:29.177835  4037 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
update_engine[4037]: I1106 15:31:29.178079  4037 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
update_engine[4037]: I1106 15:31:29.178164  4037 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
update_engine[4037]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="ondemandupdate" ismachine="1">
update_engine[4037]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
update_engine[4037]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1>
update_engine[4037]:         <ping active="1"></ping>
update_engine[4037]:         <updatecheck></updatecheck>
update_engine[4037]:         <event eventtype="3" eventresult="2" previousversion=""></event>
update_engine[4037]:     </app>
update_engine[4037]: </request>
update_engine[4037]: I1106 15:31:29.178473  4037 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
update_engine[4037]: I1106 15:31:29.180601  4037 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
update_engine[4037]: I1106 15:31:29.180939  4037 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
update_engine[4037]: I1106 15:31:29.236270  4037 libcurl_http_fetcher.cc:240] HTTP response code: 200
update_engine[4037]: I1106 15:31:29.240339  4037 libcurl_http_fetcher.cc:297] Transfer completed (200), 311 bytes downloaded
update_engine[4037]: I1106 15:31:29.240381  4037 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" sta>
update_engine[4037]: E1106 15:31:29.240526  4037 omaha_request_action.cc:394] Unknown Omaha response status: error-internal
update_engine[4037]: I1106 15:31:29.240554  4037 action_processor.cc:68] ActionProcessor::ActionComplete: OmahaRequestAction action failed. Aborting processing.
update_engine[4037]: I1106 15:31:29.240561  4037 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
update_engine[4037]: I1106 15:31:29.240564  4037 update_attempter.cc:290] Processing Done.
update_engine[4037]: E1106 15:31:29.240586  4037 update_attempter.cc:603] Update failed.
update_engine[4037]: I1106 15:31:29.240590  4037 payload_state.cc:97] Updating payload state for error code: 34 (kActionCodeOmahaResponseInvalid)
update_engine[4037]: I1106 15:31:29.240597  4037 payload_state.cc:103] Ignoring failures until we get a valid Omaha response.
update_engine[4037]: I1106 15:31:29.240675  4037 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
update_engine[4037]: I1106 15:31:29.240698  4037 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
update_engine[4037]: I1106 15:31:29.240702  4037 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
update_engine[4037]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="ondemandupdate" ismachine="1">
update_engine[4037]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
update_engine[4037]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1>
update_engine[4037]:         <event eventtype="3" eventresult="0" errorcode="268435490"></event>
update_engine[4037]:     </app>
update_engine[4037]: </request>
update_engine[4037]: I1106 15:31:29.240710  4037 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
update_engine[4037]: I1106 15:31:29.240737  4037 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
update_engine[4037]: I1106 15:31:29.240861  4037 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
update_engine[4037]: I1106 15:31:29.273200  4037 libcurl_http_fetcher.cc:240] HTTP response code: 200
update_engine[4037]: I1106 15:31:29.275542  4037 libcurl_http_fetcher.cc:297] Transfer completed (200), 192 bytes downloaded
update_engine[4037]: I1106 15:31:29.275578  4037 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" sta>
update_engine[4037]: E1106 15:31:29.275602  4037 omaha_request_action.cc:599] HTTP reported success but Omaha reports an error.
update_engine[4037]: I1106 15:31:29.275611  4037 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
update_engine[4037]: I1106 15:31:29.275619  4037 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
update_engine[4037]: I1106 15:31:29.275624  4037 update_attempter.cc:290] Processing Done.
update_engine[4037]: I1106 15:31:29.275632  4037 update_attempter.cc:294] Error event sent.
update_engine[4037]: I1106 15:32:14.252684  4037 update_attempter.cc:483] Already updated boot flags. Skipping.
pothos commented 4 years ago

The output of Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" sta> is truncated, you can pipe the journalctl output to cat to prevent the interactive display where you have to scroll with the arrow keys.

CptDaniel commented 4 years ago

Oh, haven't seen that I cut off something. Thanks. Here's the full log output:

systemd[1]: Starting Update Engine...
update_engine[6082]: I1109 08:35:02.841235  6082 main.cc:89] CoreOS Update Engine starting
update_engine[6082]: I1109 08:35:02.842274  6082 payload_state.cc:359] Current Response Signature =
update_engine[6082]: NumURLs = 1
update_engine[6082]: Url0 = https://update.release.core-os.net/amd64-usr/2512.3.0/update.gz
update_engine[6082]: Payload Size = 473835028
update_engine[6082]: Payload Sha256 Hash = UWcbzrUFjfyyH47erJ1mcAmkZ9aH96gZC4Hu4f/muUA=
update_engine[6082]: Is Delta Payload = 0
update_engine[6082]: Max Failure Count Per Url = 10
update_engine[6082]: Disable Payload Backoff = 1
update_engine[6082]: I1109 08:35:02.843492  6082 payload_state.cc:380] Payload Attempt Number = 2298
update_engine[6082]: I1109 08:35:02.844543  6082 payload_state.cc:403] Current URL Index = 0
update_engine[6082]: I1109 08:35:02.845484  6082 payload_state.cc:424] Current URL (Url0)'s Failure Count = 0
update_engine[6082]: I1109 08:35:02.846400  6082 payload_state.cc:451] Backoff Expiry Time = 01/01/70 00:00:00 UTC
systemd[1]: Started Update Engine.
update_engine[6082]: I1109 08:35:02.854794  6082 update_check_scheduler.cc:74] Next update check in 4m44s
update_engine[6082]: I1109 08:35:02.876106  6082 dbus_service.cc:57] Attempting interactive update
update_engine[6082]: I1109 08:35:02.876129  6082 update_attempter.cc:263] New update check requested
update_engine[6082]: I1109 08:35:02.876432  6082 omaha_request_params.cc:59] Current group set to stable
update_engine[6082]: I1109 08:35:02.876521  6082 update_attempter.cc:493] Updating boot flags...
update_engine[6082]: I1109 08:35:02.910991  6082 update_attempter.cc:626] Scheduling an action processor start.
update_engine[6082]: I1109 08:35:02.911044  6082 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
update_engine[6082]: I1109 08:35:02.911186  6082 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
update_engine[6082]: I1109 08:35:02.911193  6082 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
update_engine[6082]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="ondemandupdate" ismachine="1">
update_engine[6082]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
update_engine[6082]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1775" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
update_engine[6082]:         <ping active="1"></ping>
update_engine[6082]:         <updatecheck></updatecheck>
update_engine[6082]:         <event eventtype="3" eventresult="2" previousversion=""></event>
update_engine[6082]:     </app>
update_engine[6082]: </request>
update_engine[6082]: I1109 08:35:02.911216  6082 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
update_engine[6082]: I1109 08:35:02.913820  6082 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
update_engine[6082]: I1109 08:35:02.914166  6082 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
update_engine[6082]: I1109 08:35:02.978315  6082 libcurl_http_fetcher.cc:240] HTTP response code: 200
update_engine[6082]: I1109 08:35:02.979966  6082 libcurl_http_fetcher.cc:297] Transfer completed (200), 311 bytes downloaded
update_engine[6082]: I1109 08:35:02.979998  6082 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="error-updateInProgressOnInstance"><ping status="ok"></ping><updatecheck status="error-internal"><urls></urls></updatecheck><event status="ok"></event></app></response>
update_engine[6082]: E1109 08:35:02.981685  6082 omaha_request_action.cc:394] Unknown Omaha response status: error-internal
update_engine[6082]: I1109 08:35:02.981732  6082 action_processor.cc:68] ActionProcessor::ActionComplete: OmahaRequestAction action failed. Aborting processing.
update_engine[6082]: I1109 08:35:02.981737  6082 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
update_engine[6082]: I1109 08:35:02.981742  6082 update_attempter.cc:290] Processing Done.
update_engine[6082]: E1109 08:35:02.981761  6082 update_attempter.cc:603] Update failed.
update_engine[6082]: I1109 08:35:02.981766  6082 payload_state.cc:97] Updating payload state for error code: 34 (kActionCodeOmahaResponseInvalid)
update_engine[6082]: I1109 08:35:02.981771  6082 payload_state.cc:103] Ignoring failures until we get a valid Omaha response.
update_engine[6082]: I1109 08:35:02.981860  6082 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
update_engine[6082]: I1109 08:35:02.981900  6082 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
update_engine[6082]: I1109 08:35:02.981905  6082 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
update_engine[6082]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="ondemandupdate" ismachine="1">
update_engine[6082]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
update_engine[6082]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1775" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
update_engine[6082]:         <event eventtype="3" eventresult="0" errorcode="268435490"></event>
update_engine[6082]:     </app>
update_engine[6082]: </request>
update_engine[6082]: I1109 08:35:02.981911  6082 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
update_engine[6082]: I1109 08:35:02.981950  6082 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
update_engine[6082]: I1109 08:35:02.982084  6082 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
update_engine[6082]: I1109 08:35:03.012256  6082 libcurl_http_fetcher.cc:240] HTTP response code: 200
update_engine[6082]: I1109 08:35:03.014174  6082 libcurl_http_fetcher.cc:297] Transfer completed (200), 192 bytes downloaded
update_engine[6082]: I1109 08:35:03.014215  6082 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="ok"><event status="ok"></event></app></response>
update_engine[6082]: E1109 08:35:03.014237  6082 omaha_request_action.cc:599] HTTP reported success but Omaha reports an error.
update_engine[6082]: I1109 08:35:03.014246  6082 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
update_engine[6082]: I1109 08:35:03.014251  6082 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
update_engine[6082]: I1109 08:35:03.014256  6082 update_attempter.cc:290] Processing Done.
update_engine[6082]: I1109 08:35:03.014262  6082 update_attempter.cc:294] Error event sent.
pothos commented 4 years ago

Was this instance tried to be migrated already in September and October? From the server log it looks like the machine wasn't rebooted then and somehow ended up using the CoreOS partition again, but in October booted into Flatcar. I somehow don't see an entry for a recent event – with journalctl -u update-engine there is normally a date and time prefix for each line, can you try again to get the full log with time annotation? It can be that I'm not looking at the right instance in the server logs.

Maybe there is a case not covered in the script where a CoreOS update was applied but not rebooted, but you said that even after a reboot you couldn't get it running? Can you provide the output of the script? Does it have an exit code other than 0 (echo $? directly after running)? What is the output of update_engine_client -status?

pothos commented 4 years ago

By the way, when running the script and rebooting you have to make sure that the update-engine service is enabled and runs for at least a minute after the system booted into the new partition.

CptDaniel commented 4 years ago

I tried updating every now and then. The try I got the logs from was yesterday 9:35 CET, just before I posted them. I removed the timestamps since I thought they would be a bit noisy, sorry for this. Ya, I rebooted the server multiple times. Here fresh logs with timestamp, I got one run I triggered manually and one before that was automatic:

Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.235994  6082 prefs.cc:51] certificate-report-to-send-update not present in /var/lib/update_engine/prefs
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236052  6082 prefs.cc:51] certificate-report-to-send-download not present in /var/lib/update_engine/prefs
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236349  6082 omaha_request_params.cc:59] Current group set to stable
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236476  6082 update_attempter.cc:483] Already updated boot flags. Skipping.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236483  6082 update_attempter.cc:626] Scheduling an action processor start.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236508  6082 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236620  6082 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236627  6082 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
Nov 10 12:47:10 server update_engine[6082]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="scheduler" ismachine="1">
Nov 10 12:47:10 server update_engine[6082]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
Nov 10 12:47:10 server update_engine[6082]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1775" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Nov 10 12:47:10 server update_engine[6082]:         <ping active="1"></ping>
Nov 10 12:47:10 server update_engine[6082]:         <updatecheck></updatecheck>
Nov 10 12:47:10 server update_engine[6082]:         <event eventtype="3" eventresult="2" previousversion=""></event>
Nov 10 12:47:10 server update_engine[6082]:     </app>
Nov 10 12:47:10 server update_engine[6082]: </request>
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236634  6082 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236737  6082 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.236857  6082 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.287079  6082 libcurl_http_fetcher.cc:240] HTTP response code: 200
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.289327  6082 libcurl_http_fetcher.cc:297] Transfer completed (200), 311 bytes downloaded
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.289609  6082 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="error-updateInProgressOnInstance"><ping status="ok"></ping><updatecheck status="error-internal"><urls></urls></updatecheck><event status="ok"></event></app></response>
Nov 10 12:47:10 server update_engine[6082]: E1110 12:47:10.289778  6082 omaha_request_action.cc:394] Unknown Omaha response status: error-internal
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.289896  6082 action_processor.cc:68] ActionProcessor::ActionComplete: OmahaRequestAction action failed. Aborting processing.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290004  6082 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290122  6082 update_attempter.cc:290] Processing Done.
Nov 10 12:47:10 server update_engine[6082]: E1110 12:47:10.290215  6082 update_attempter.cc:603] Update failed.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290292  6082 payload_state.cc:97] Updating payload state for error code: 34 (kActionCodeOmahaResponseInvalid)
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290390  6082 payload_state.cc:103] Ignoring failures until we get a valid Omaha response.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290918  6082 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290972  6082 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290979  6082 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
Nov 10 12:47:10 server update_engine[6082]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="scheduler" ismachine="1">
Nov 10 12:47:10 server update_engine[6082]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
Nov 10 12:47:10 server update_engine[6082]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1775" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Nov 10 12:47:10 server update_engine[6082]:         <event eventtype="3" eventresult="0" errorcode="268435490"></event>
Nov 10 12:47:10 server update_engine[6082]:     </app>
Nov 10 12:47:10 server update_engine[6082]: </request>
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.290984  6082 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.291023  6082 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.291144  6082 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.316737  6082 libcurl_http_fetcher.cc:240] HTTP response code: 200
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318581  6082 libcurl_http_fetcher.cc:297] Transfer completed (200), 192 bytes downloaded
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318619  6082 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="ok"><event status="ok"></event></app></response>
Nov 10 12:47:10 server update_engine[6082]: E1110 12:47:10.318641  6082 omaha_request_action.cc:599] HTTP reported success but Omaha reports an error.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318648  6082 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318652  6082 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318657  6082 update_attempter.cc:290] Processing Done.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318663  6082 update_attempter.cc:294] Error event sent.
Nov 10 12:47:10 server update_engine[6082]: I1110 12:47:10.318676  6082 update_check_scheduler.cc:74] Next update check in 49m59s
Nov 10 13:36:00 server systemd[1]: Stopping Update Engine...
Nov 10 13:36:00 server systemd[1]: update-engine.service: Main process exited, code=exited, status=1/FAILURE
Nov 10 13:36:00 server systemd[1]: update-engine.service: Failed with result 'exit-code'.
Nov 10 13:36:00 server systemd[1]: Stopped Update Engine.
Nov 10 13:36:00 server systemd[1]: Starting Update Engine...
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.865141 25743 main.cc:89] CoreOS Update Engine starting
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.866369 25743 payload_state.cc:359] Current Response Signature =
Nov 10 13:36:00 server update_engine[25743]: NumURLs = 1
Nov 10 13:36:00 server update_engine[25743]: Url0 = https://update.release.core-os.net/amd64-usr/2512.3.0/update.gz
Nov 10 13:36:00 server update_engine[25743]: Payload Size = 473835028
Nov 10 13:36:00 server update_engine[25743]: Payload Sha256 Hash = UWcbzrUFjfyyH47erJ1mcAmkZ9aH96gZC4Hu4f/muUA=
Nov 10 13:36:00 server update_engine[25743]: Is Delta Payload = 0
Nov 10 13:36:00 server update_engine[25743]: Max Failure Count Per Url = 10
Nov 10 13:36:00 server update_engine[25743]: Disable Payload Backoff = 1
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.869766 25743 payload_state.cc:380] Payload Attempt Number = 2298
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.870537 25743 payload_state.cc:403] Current URL Index = 0
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.871302 25743 payload_state.cc:424] Current URL (Url0)'s Failure Count = 0
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.872015 25743 payload_state.cc:451] Backoff Expiry Time = 01/01/70 00:00:00 UTC
Nov 10 13:36:00 server systemd[1]: Started Update Engine.
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.874699 25743 update_check_scheduler.cc:74] Next update check in 8m56s
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.900146 25743 dbus_service.cc:57] Attempting interactive update
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.900173 25743 update_attempter.cc:263] New update check requested
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.900458 25743 omaha_request_params.cc:59] Current group set to stable
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.900662 25743 update_attempter.cc:493] Updating boot flags...
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.939007 25743 update_attempter.cc:626] Scheduling an action processor start.
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.939079 25743 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.939213 25743 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.939220 25743 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
Nov 10 13:36:00 server update_engine[25743]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="ondemandupdate" ismachine="1">
Nov 10 13:36:00 server update_engine[25743]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
Nov 10 13:36:00 server update_engine[25743]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1775" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Nov 10 13:36:00 server update_engine[25743]:         <ping active="1"></ping>
Nov 10 13:36:00 server update_engine[25743]:         <updatecheck></updatecheck>
Nov 10 13:36:00 server update_engine[25743]:         <event eventtype="3" eventresult="2" previousversion=""></event>
Nov 10 13:36:00 server update_engine[25743]:     </app>
Nov 10 13:36:00 server update_engine[25743]: </request>
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.939227 25743 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.941359 25743 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.941519 25743 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.993013 25743 libcurl_http_fetcher.cc:240] HTTP response code: 200
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995312 25743 libcurl_http_fetcher.cc:297] Transfer completed (200), 311 bytes downloaded
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995348 25743 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="error-updateInProgressOnInstance"><ping status="ok"></ping><updatecheck status="error-internal"><urls></urls></updatecheck><event status="ok"></event></app></response>
Nov 10 13:36:00 server update_engine[25743]: E1110 13:36:00.995549 25743 omaha_request_action.cc:394] Unknown Omaha response status: error-internal
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995579 25743 action_processor.cc:68] ActionProcessor::ActionComplete: OmahaRequestAction action failed. Aborting processing.
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995585 25743 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995590 25743 update_attempter.cc:290] Processing Done.
Nov 10 13:36:00 server update_engine[25743]: E1110 13:36:00.995609 25743 update_attempter.cc:603] Update failed.
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995615 25743 payload_state.cc:97] Updating payload state for error code: 34 (kActionCodeOmahaResponseInvalid)
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995620 25743 payload_state.cc:103] Ignoring failures until we get a valid Omaha response.
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995707 25743 action_processor.cc:36] ActionProcessor::StartProcessing: OmahaRequestAction
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995738 25743 omaha_request_action.cc:245] Posting an Omaha request to https://public.update.flatcar-linux.net/v1/update/
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995745 25743 omaha_request_action.cc:246] Request: <?xml version="1.0" encoding="UTF-8"?>
Nov 10 13:36:00 server update_engine[25743]: <request protocol="3.0" version="update_engine-0.4.10" updaterversion="update_engine-0.4.10" installsource="ondemandupdate" ismachine="1">
Nov 10 13:36:00 server update_engine[25743]:     <os version="Chateau" platform="CoreOS" sp="0.0.0_x86_64"></os>
Nov 10 13:36:00 server update_engine[25743]:     <app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" version="0.0.0" track="stable" bootid="{a8821e7c-b02b-4ec7-b059-a1e99489f2c6}" oem="" oemversion="" alephversion="1911.4.0" machineid="00fa7a4ca0d441049ca21a9fbbee1775" lang="en-US" board="amd64-usr" hardware_class="" delta_okay="false" >
Nov 10 13:36:00 server update_engine[25743]:         <event eventtype="3" eventresult="0" errorcode="268435490"></event>
Nov 10 13:36:00 server update_engine[25743]:     </app>
Nov 10 13:36:00 server update_engine[25743]: </request>
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995752 25743 libcurl_http_fetcher.cc:48] Starting/Resuming transfer
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995781 25743 libcurl_http_fetcher.cc:164] Setting up curl options for HTTPS
Nov 10 13:36:00 server update_engine[25743]: I1110 13:36:00.995968 25743 libcurl_http_fetcher.cc:427] Setting up timeout source: 1 seconds.
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.026437 25743 libcurl_http_fetcher.cc:240] HTTP response code: 200
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.028213 25743 libcurl_http_fetcher.cc:297] Transfer completed (200), 192 bytes downloaded
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.028242 25743 omaha_request_action.cc:592] Omaha request response: <response protocol="3.0" server="nebraska"><daystart elapsed_seconds="0"></daystart><app appid="{e96281a6-d1af-4bde-9a0a-97b76e56dc57}" status="ok"><event status="ok"></event></app></response>
Nov 10 13:36:01 server update_engine[25743]: E1110 13:36:01.028260 25743 omaha_request_action.cc:599] HTTP reported success but Omaha reports an error.
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.028267 25743 action_processor.cc:65] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.028272 25743 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaRequestAction
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.028277 25743 update_attempter.cc:290] Processing Done.
Nov 10 13:36:01 server update_engine[25743]: I1110 13:36:01.028283 25743 update_attempter.cc:294] Error event sent.
Nov 10 13:36:46 server update_engine[25743]: I1110 13:36:46.246703 25743 update_attempter.cc:483] Already updated boot flags. Skipping.

The error code of the manual run was 1, so it got an error.

Here's the output of update_engine_client -status:

I1110 13:41:25.475896 25790 update_engine_client.cc:237] Querying Update Engine status...
LAST_CHECKED_TIME=0
PROGRESS=0.000000
CURRENT_OP=UPDATE_STATUS_IDLE
NEW_VERSION=0.0.0
NEW_SIZE=0

Hope this helps and thanks for helping me.

pothos commented 4 years ago

Thanks! Now I see that the problem is being stuck in the Downloading instance state at the server side. Normally there is a request which brings it out of this state again but somehow this was missing for you. I guess instead of trying to fix this in update-engine it would be better if the server would detect that the instance doesn't really download since it's already doing a new update request.

CptDaniel commented 4 years ago

Oh, ok. So this means there's not much I can do to fix this and I have to wait, at least for now, until this is fixed on the update servers, right?

CptDaniel commented 4 years ago

Hi @pothos, could you find any fix for this issue or is this something I have to change on our server? Thanks again.

pothos commented 4 years ago

We will have to fix this in the update server because it rejects the state transitions unless the exact version is reported as previous version. I was able to get into this state with new instances but also out of this state through a custom request. However, for your instance this wasn't possible. I don't know what I missed. I recommend setting a different value in /etc/machine-id for the mean time. This causes systemd to think the system runs the first time which may have a side effect that I'm not aware of but in general I think it's ok.

CptDaniel commented 4 years ago

Oh wow, that helped! I guess, since we cloned all our VMs they all had the same machine-id. This would also explain why all of our VMs had this issue with migration to Flatcar. If it helps in any way, I can leave a machine in this faulty CoreOS state, otherwise I'll just change the machine-id and migrate them all to Flatcar. Thanks a lot!

pothos commented 4 years ago

If you clone a VM, you have to do it before the first boot. Otherwise Ignition won't run and you also have the same machine ID and GPT ID.

pothos commented 4 years ago

No need to keep the machine, thanks for finding the duplicate machine ID issue. Your initial report helped to spot a case we need to fix anyway. I now created https://github.com/kinvolk/nebraska/issues/303 for that and will close this issue here.

CptDaniel commented 4 years ago

Ok. Thank you for helping with our VM issue.