Kcr19 / mender_gcp_ota_demo

Demo scripts for a GCP+Mender demo.
Apache License 2.0
0 stars 1 forks source link

mender client needs restarting after new conf #14

Closed ptone closed 5 years ago

ptone commented 5 years ago

@drewmoseley the statescripts are falling into a very slow retry time for demo purposes

Looking at

journalctl -u mender

Aug 27 18:28:36 raspberrypi3 mender[327]: time="2018-08-27T18:28:36Z" level=info msg="State transition: authorize [Sync] -> authorize-wait [Idle]" module=mender Aug 27 18:28:36 raspberrypi3 mender[327]: [[0;1;31mlevel=error msg="authorize failed: transient error: authorization request failed: generic error occured while executing authorization request: Post https://mender.gcpotad emo.com/api/devices/v1/authentication/auth_requests: dial tcp: lookup mender.gcpotademo.com on 192.168.86.1:53: no such host" module=state[[0m Aug 27 18:28:36 raspberrypi3 mender[327]: level=info msg="State transition: authorize [Sync] -> authorize-wait [Idle]" module=mender

root@raspberrypi3:/etc/mender/scripts# cat /etc/mender/mender.conf {"ServerURL": "https://35.192.46.79", "TenantToken": "dummy", "RetryPollIntervalSeconds": 30, "UpdatePollIntervalSeconds": 30, "ClientProtocol": "http", "RootfsPartB": "/dev/mmcblk0p3", "RootfsPartA": "/dev/mmcblk0p2", "HttpsClient": {"SkipVerify": true}, "InventoryPollIntervalSeconds": 30}root@raspberrypi3:/etc/mender/scripts#

after doing a restart of mender:

Aug 27 18:29:51 raspberrypi3 systemd[1]: Stopping Mender OTA update service... Aug 27 18:29:51 raspberrypi3 systemd[1]: Stopped Mender OTA update service. Aug 27 18:29:51 raspberrypi3 systemd[1]: Started Mender OTA update service. Aug 27 18:29:51 raspberrypi3 mender[750]: [[0;1;39mlevel=warning msg="Server certificate not provided. Trusting all servers." module=client[[0m Aug 27 18:29:51 raspberrypi3 mender[750]: time="2018-08-27T18:29:51Z" level=warning msg="Server certificate not provided. Trusting all servers." module=client Aug 27 18:29:51 raspberrypi3 mender[750]: [[0;1;39mlevel=warning msg="certificate verification skipped.." module=client[[0m Aug 27 18:29:51 raspberrypi3 mender[750]: time="2018-08-27T18:29:51Z" level=warning msg="certificate verification skipped.." module=client Aug 27 18:29:51 raspberrypi3 mender[750]: level=info msg="State transition: init [none] -> init [none]" module=mender Aug 27 18:29:51 raspberrypi3 mender[750]: time="2018-08-27T18:29:51Z" level=info msg="State transition: init [none] -> init [none]" module=mender Aug 27 18:29:51 raspberrypi3 mender[750]: level=info msg="State transition: init [none] -> idle [Idle]" module=mender Aug 27 18:29:51 raspberrypi3 mender[750]: level=info msg="State transition: idle [Idle] -> authorize [Sync]" module=mender Aug 27 18:29:51 raspberrypi3 mender[750]: time="2018-08-27T18:29:51Z" level=info msg="State transition: init [none] -> idle [Idle]" module=mender Aug 27 18:29:51 raspberrypi3 mender[750]: time="2018-08-27T18:29:51Z" level=info msg="State transition: idle [Idle] -> authorize [Sync]" module=mender Aug 27 18:29:52 raspberrypi3 mender[750]: level=info msg="successfuly received new authorization data" module=mender

Do we need to HUP something as part of the activation?

ptone commented 5 years ago

tl;dr - systemctl restart mender does what I wanted - how should this be done from the POV of the activation py script?

ptone commented 5 years ago

@drewmoseley note that I started this issue looking at the slow retry - but now think they may be a red-herring, or at least secondary to the restart-fixes-the-problem (where I'm not sure if they are related, is that maybe the restart also short-circuited a 30m retry period?)

drewmoseley commented 5 years ago

@ptone yes, I think restarting in this case will simply reset the retry period and force it to happen now. Since you are modifying the mender.conf file and injecting the proper server URL, I think the "systemctl restart" is the right approach just to get it going sooner.

Is there anything else you are trying to accomplish?

ptone commented 5 years ago

I did not let it go 30 minutes, but I did just verify with journalctl - that there is a 1 min retry that keeps failing on the host lookup - even when the mender.conf has been updated

I could have sworn mender client picked this up all on its own in the past. force restarting the mender service feels a bit janky - but if it works, it works.

ptone commented 5 years ago

FWIW - this seems to not be an issue if the image's /etc/mender/mender.conf does not have a serverUrl set at all.

Or at least if I reset this to {} for conf and try to repeat with a "scrubbed image" I don't need to restart the mender service

drewmoseley commented 5 years ago

Hi @ptone I've reached out to the architects of the Mender client to see if we can do anything to make this behavior more consistent.

It sounds like for now restarting the mender client in your external scripting should suffice as a workaround. Is that correct?