Azure / iot-edge-v1

Azure IoT Edge
http://azure.github.io/iot-edge/
Other
525 stars 257 forks source link

Broken access to homedir/sockets causes service failure after installing Azure IoT Edge runtime on Raspbian #644

Closed altermarkive closed 5 years ago

altermarkive commented 5 years ago

Hi,

I hope you'll be able to help, perhaps I'm missing something obvious, but I figured I'll share the issue and the results of my troubleshooting.

Issue: After following the default how-to guide (here) the iotedge.service fails to start.

Situation:

Description: While troubleshooting (journalctl -u iotedge --no-pager --no-full) I noticed that by default the home directory for the IoT Edge Agent container is set to a non-existing /tmp/iotedge - running manually sudo -u iotedge /usr/bin/iotedged -c /etc/iotedge/config.yaml confirmed a file not found error on /tmp/iotedge. I fixed that by setting homedir in /etc/iotedge/config.yml to a pre-existing directory.

At this point the service could make some more progress but it was still failing - this time with an access denied error. Checking the output of the manually ran service command showed that the issue was related to the socket files in /var/run/iotedge/.

I explored and experimented with the user IDs and permissions and noticed that by default:

Poking around further I used the following workarounds to make the service start:

I automated the whole installation process including workarounds with Ansible - see the repo here

I also noticed that others are also struggling with similar issues:

I don't know enough to suggest a solution (perhaps using iotedge user instead of edgehubuser could help) but maybe you will have an idea what is wrong. Of course, I'll be glad to provide any particular diagnostics/logs/outputs to help resolve the issue.

And thanks for reading the long post!

darobs commented 5 years ago

Hello @altermarkive

Thank you for the detailed information. It looks like installing the package doesn't do everything correctly all the time. I have never personally experienced this, which means that I'm doing something different - I'd love to find out what that is.

The fact that it's pointing the iotedge homedir to "/tmp/iotedge" implies that it didn't initially read from the expected configuration file.

Out of curiosity, when you looked at the log of iotedge, did you see something like this:

2018-05-06 23:54:49.763 +00:00 [INF] - Initializing configuration 2018-05-06 23:54:51.071 +00:00 [INF] - Created persistent store at /tmp/edgeHub

And you didn't see a line that said, "Using config file: /etc/iotedge/config.yaml"?

Thanks, we will be looking into this.

altermarkive commented 5 years ago

Hello @darobs

Wow, I did not expect a reply so soon let alone still on the weekend - hats off to you my good sir!

I'm doing something different - I'd love to find out what that is.

Awesome! I reflashed my device with a fresh Raspbian and reverted the Ansible scripts to the original state based on how-to and without the workarounds (see this branch https://github.com/altermarkive/install-azure-iot-edge-on-raspbian-with-ansible/tree/vanilla).

And you didn't see a line that said, "Using config file: /etc/iotedge/config.yaml"?

I always see that line - so I think the config is always picked up.

The /etc/iotedge/config.yaml looks like this:

provisioning:
  source: "manual"
  device_connection_string: "HostName=zzzzzzzzzzz.azure-devices.net;DeviceId=yyyyyyyyyyyyyyyy;SharedAccessKey=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx="

Output of systemctl status iotedge:

● iotedge.service - Azure IoT Edge daemon
   Loaded: loaded (/lib/systemd/system/iotedge.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2018-07-23 06:10:19 BST; 22min ago
     Docs: man:iotedged(8)
  Process: 3760 ExecStart=/usr/bin/iotedged -c /etc/iotedge/config.yaml (code=exited, status=1/FAILURE)
 Main PID: 3760 (code=exited, status=1/FAILURE)
      CPU: 97ms

Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Service hold-off time over, scheduling restart.
Jul 23 06:10:19 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Start request repeated too quickly.
Jul 23 06:10:19 raspberrypi systemd[1]: Failed to start Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.

Output of journalctl -u iotedge --no-pager --no-full:

-- Logs begin at Thu 2016-11-03 17:16:42 GMT, end at Mon 2018-07-23 06:30:02 BST. --
Jul 23 06:09:14 raspberrypi systemd[1]: Started Azure IoT Edge daemon.
Jul 23 06:09:14 raspberrypi iotedged[3386]: 2018-07-23T05:09:14Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 23 06:09:14 raspberrypi iotedged[3386]: 2018-07-23T05:09:14Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc…a2dd6b)
Jul 23 06:09:14 raspberrypi iotedged[3386]: 2018-07-23T05:09:14Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 23 06:09:14 raspberrypi iotedged[3386]: 2018-07-23T05:09:14Z [INFO] - Using runtime network id azure-iot-edge
Jul 23 06:09:14 raspberrypi iotedged[3386]: 2018-07-23T05:09:14Z [INFO] - Initializing the module runtime...
Jul 23 06:09:16 raspberrypi iotedged[3386]: 2018-07-23T05:09:16Z [INFO] - Finished initializing the module runtime.
Jul 23 06:09:16 raspberrypi iotedged[3386]: 2018-07-23T05:09:16Z [INFO] - Configuring /var/lib/iotedge as the home…ectory.
Jul 23 06:09:16 raspberrypi iotedged[3386]: 2018-07-23T05:09:16Z [INFO] - Configuring certificates...
Jul 23 06:09:16 raspberrypi iotedged[3386]: 2018-07-23T05:09:16Z [INFO] - Transparent gateway certificates not fou…mode...
Jul 23 06:09:16 raspberrypi iotedged[3386]: 2018-07-23T05:09:16Z [INFO] - Finished configuring certificates.
Jul 23 06:09:16 raspberrypi iotedged[3386]: 2018-07-23T05:09:16Z [INFO] - Initializing hsm...
Jul 23 06:09:37 raspberrypi systemd[1]: Stopping Azure IoT Edge daemon...
Jul 23 06:10:17 raspberrypi systemd[1]: iotedge.service: State 'stop-sigterm' timed out. Killing.
Jul 23 06:10:17 raspberrypi systemd[1]: iotedge.service: Killing process 3386 (iotedged) with signal SIGKILL.
Jul 23 06:10:17 raspberrypi systemd[1]: iotedge.service: Main process exited, code=killed, status=9/KILL
Jul 23 06:10:17 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:17 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:17 raspberrypi systemd[1]: iotedge.service: Failed with result 'timeout'.
Jul 23 06:10:17 raspberrypi systemd[1]: Started Azure IoT Edge daemon.
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc…a2dd6b)
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Using runtime network id azure-iot-edge
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Initializing the module runtime...
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Finished initializing the module runtime.
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Configuring /tmp/iotedge as the home directory.
Jul 23 06:10:18 raspberrypi iotedged[3725]: 2018-07-23T05:10:18Z [INFO] - Configuring certificates...
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Service hold-off time over, scheduling restart.
Jul 23 06:10:18 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:18 raspberrypi systemd[1]: Started Azure IoT Edge daemon.
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc…a2dd6b)
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Using runtime network id azure-iot-edge
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Initializing the module runtime...
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Finished initializing the module runtime.
Jul 23 06:10:18 raspberrypi iotedged[3739]: 2018-07-23T05:10:18Z [INFO] - Configuring /tmp/iotedge as the home directory.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Service hold-off time over, scheduling restart.
Jul 23 06:10:18 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:18 raspberrypi systemd[1]: Started Azure IoT Edge daemon.
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc…a2dd6b)
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Using runtime network id azure-iot-edge
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Initializing the module runtime...
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Finished initializing the module runtime.
Jul 23 06:10:18 raspberrypi iotedged[3746]: 2018-07-23T05:10:18Z [INFO] - Configuring /tmp/iotedge as the home directory.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.
Jul 23 06:10:18 raspberrypi systemd[1]: iotedge.service: Service hold-off time over, scheduling restart.
Jul 23 06:10:18 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:18 raspberrypi systemd[1]: Started Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc…a2dd6b)
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Using runtime network id azure-iot-edge
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Initializing the module runtime...
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Finished initializing the module runtime.
Jul 23 06:10:19 raspberrypi iotedged[3753]: 2018-07-23T05:10:19Z [INFO] - Configuring /tmp/iotedge as the home directory.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Service hold-off time over, scheduling restart.
Jul 23 06:10:19 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi systemd[1]: Started Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Starting Azure IoT Edge Security Daemon
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc…a2dd6b)
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Using config file: /etc/iotedge/config.yaml
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Using runtime network id azure-iot-edge
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Initializing the module runtime...
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Finished initializing the module runtime.
Jul 23 06:10:19 raspberrypi iotedged[3760]: 2018-07-23T05:10:19Z [INFO] - Configuring /tmp/iotedge as the home directory.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Service hold-off time over, scheduling restart.
Jul 23 06:10:19 raspberrypi systemd[1]: Stopped Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Start request repeated too quickly.
Jul 23 06:10:19 raspberrypi systemd[1]: Failed to start Azure IoT Edge daemon.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Unit entered failed state.
Jul 23 06:10:19 raspberrypi systemd[1]: iotedge.service: Failed with result 'exit-code'.

I noticed that it in fact it first uses correctly /var/lib/iotedge once and then, after the first failure, goes on to use /tmp/iotedge instead...

Then after that, the output of sudo -u iotedge /usr/bin/iotedged -c /etc/iotedge/config.yaml:

<6>2018-07-23T05:43:13Z [INFO] - Starting Azure IoT Edge Security Daemon
<6>2018-07-23T05:43:13Z [INFO] - Version - 1.0.0 (52ef77db24126bf473265fc09c53d35290a2dd6b)
<6>2018-07-23T05:43:13Z [INFO] - Using config file: /etc/iotedge/config.yaml
<6>2018-07-23T05:43:13Z [INFO] - Using runtime network id azure-iot-edge
<6>2018-07-23T05:43:13Z [INFO] - Initializing the module runtime...
<6>2018-07-23T05:43:13Z [INFO] - Finished initializing the module runtime.
<6>2018-07-23T05:43:13Z [INFO] - Configuring /tmp/iotedge as the home directory.
<6>2018-07-23T05:43:13Z [INFO] - Configuring certificates...
<6>2018-07-23T05:43:13Z [INFO] - Transparent gateway certificates not found, operating in quick start mode...
<6>2018-07-23T05:43:13Z [INFO] - Finished configuring certificates.
<6>2018-07-23T05:43:13Z [INFO] - Initializing hsm...
<3>2018-07-23T05:43:13Z [ERR!] (/project/hsm-sys/azure-iot-hsm-c/src/edge_hsm_client_store.c:get_base_dir:473) Directory path in env variable IOTEDGE_HOMEDIR is invalid. Found /tmp/iotedge
<3>2018-07-23T05:43:13Z [ERR!] (/project/hsm-sys/azure-iot-hsm-c/src/edge_hsm_client_store.c:hsm_provision:1906) HSM base directory does not exist. Set environment variable IOTEDGE_HOMEDIR to a valid path.
<3>2018-07-23T05:43:13Z [ERR!] (/project/hsm-sys/azure-iot-hsm-c/src/edge_hsm_client_crypto.c:hsm_client_crypto_init:43) Could not create store. Error code 1949
<3>2018-07-23T05:43:13Z [ERR!] - An hsm error occurred.
<3>2018-07-23T05:43:13Z [ERR!] -        caused by: HSM failure
<3>2018-07-23T05:43:13Z [ERR!] -        caused by: HSM Init failure: 44

This behavior is very repeatable - since I had it automated I tried it a couple of times (each time with a freshly flashed Raspberry Pi)

Where do we go from here?

Thanks, we will be looking into this

Thank you!!!

darobs commented 5 years ago

May we see your full (redacted for secrets) config.yaml All of this behavior is consistent with the default settings, which shouldn't be the case, so we're interested in anything different in the file.

altermarkive commented 5 years ago

The config.yml I included in my last post is the full and exact contents of the file (except for the redacted connection string) - I based it directly on how-to.

darobs commented 5 years ago

Eureka! That makes sense.

The config.yaml in the package is slightly different than the "default", and those difference affect how the iotedged works.

1) Try this:

sudo apt-get remove iotedge
sudo rm /etc/iotedge/config.yaml
sudo apt-get install <iotedged package>

When you edit /etc/iotedge/config.yaml, only replace the <ADD DEVICE CONNECTION STRING HERE> with your connection string. sudo systemctl restart iotedge

That should get you going without any trouble.

2) You aren't the only person having this problem, so I'm going to suggest a revision to our documentation to make this more clear.

Thank you!

darobs commented 5 years ago

Removing and reinstalling the iotedge daemon is the easiest way to restore the config.yaml to the appropriate default settings for the pi. You may also copy the file from here and edit the "device_connection_string" and "hostname" fields.

altermarkive commented 5 years ago

I had a hunch I must have missed something simple... at least since it's not only me then maybe the documentation gets improved.

I tried the following:

sudo apt remove iotedge
sudo rm /etc/iotedge/config.yaml
curl -L https://aka.ms/iotedged-linux-armhf-latest -o iotedge.deb && sudo dpkg -i ./iotedge.deb

Unfortunately it fails to install with the following error:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   189  100   189    0     0    599      0 --:--:-- --:--:-- --:--:--   600
100 2527k  100 2527k    0     0  1196k      0  0:00:02  0:00:02 --:--:-- 2061k
Selecting previously unselected package iotedge.
(Reading database ... 115490 files and directories currently installed.)
Preparing to unpack ./iotedge.deb ...
The user `iotedge' is already a member of `docker'.
Unpacking iotedge (1.0.0-1) ...
Setting up iotedge (1.0.0-1) ...
chown: cannot access '/etc/iotedge/config.yaml': No such file or directory
dpkg: error processing package iotedge (--install):
 subprocess installed post-installation script returned error exit status 1
Processing triggers for man-db (2.7.6.1-2) ...
Processing triggers for systemd (232-25+deb9u4) ...
Errors were encountered while processing:
 iotedge

Shall I go ahead by copying the config.yaml from the git repo you pointed me to in your last comment?

darobs commented 5 years ago

Interesting, I wonder if I cause you more trouble by telling you to remove it.

Yes, I would create and edit the file, then: sudo chown iotedge:iotedge /etc/iotedge/config.yaml sudo chmod 400 /etc/iotedge/config.yaml

altermarkive commented 5 years ago

Welp, that worked! Tomorrow I'm going to reflash the system and try to automate it - I'll check it first without putting the config.yaml from the git repo but just patching whatever the package puts in place. In any case I'll report here the outcome.

Many, many thanks! It seems like I should have asked before going on a troubleshooting rampage myself :)

altermarkive commented 5 years ago

I modified my Ansible scripts and applied them to a newly flashed Raspberry Pi and it works! All that was needed was to patch the /etc/iotedge/config.yml with a connection string rather than overwriting the file with the sparse example from the how-to.

@darobs thank you so much for your help! It was great to work together on finding a solution - I'm closing the issue.