Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 457 forks source link

Docker failed after iotedge restart #649

Closed xiaocongfu closed 5 years ago

xiaocongfu commented 5 years ago

Expected Behavior

Something like: https://docs.microsoft.com/en-us/azure/iot-edge/how-to-install-iot-edge-linux-arm

Current Behavior

A failure in iotedge status.

Steps to Reproduce

  1. I finished the steps described in https://docs.microsoft.com/en-us/azure/iot-edge/how-to-install-iot-edge-linux-arm

  2. Restart docker and check status: $ sudo systemctl restart docker $ sudo systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2018-12-14 09:40:15 UTC; 4s ago Docs: https://docs.docker.com Main PID: 11745 (dockerd) Tasks: 27 Memory: 23.8M CPU: 1.380s CGroup: /system.slice/docker.service ├─11745 /usr/bin/dockerd -H fd:// └─11780 docker-containerd --config /var/run/docker/containerd/containerd.toml

Dec 14 09:40:14 linaro-alip dockerd[11745]: time="2018-12-14T09:40:14.340916562Z" level=warning msg="Your kernel does not support cgroup blkio throttle.write_iops_device" Dec 14 09:40:14 linaro-alip dockerd[11745]: time="2018-12-14T09:40:14.345360523Z" level=info msg="Loading containers: start." Dec 14 09:40:14 linaro-alip dockerd[11745]: time="2018-12-14T09:40:14.555996531Z" level=warning msg="Could not load necessary modules for IPSEC rules: Running modprobe xfrm_user failed with message: modprobe: WARNING: Module xfrm_user not found in directory /lib/modules/4.9.51-imx_4.9.51_imx8m_ga+g6df7474, error: ex Dec 14 09:40:14 linaro-alip dockerd[11745]: time="2018-12-14T09:40:14.569558932Z" level=warning msg="Could not load necessary modules for Conntrack: Running modprobe nf_conntrack_netlink failed with message: `modprobe: WARNING: Module nf_conntrack_netlink not found in directory /lib/modules/4.9.51-imx_4.9.51_imx8m_ga+ Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.058811191Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.279257560Z" level=info msg="Loading containers: done." Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.325620762Z" level=info msg="Docker daemon" commit=9aac187 graphdriver(s)=overlay2 version=dev Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.326853042Z" level=info msg="Daemon has completed initialization" Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.355030963Z" level=info msg="API listen on /var/run/docker.sock" Dec 14 09:40:15 linaro-alip systemd[1]: Started Docker Application Container Engine.

Seemed to be right.

  1. Restart iotedge and check status $ sudo systemctl restart iotedge $ sudo systemctl status docker ● iotedge.service - Azure IoT Edge daemon Loaded: loaded (/lib/systemd/system/iotedge.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2018-12-14 09:40:31 UTC; 18s ago Docs: man:iotedged(8) Main PID: 11952 (iotedged) Tasks: 11 (limit: 4915) Memory: 46.9M CPU: 110ms CGroup: /system.slice/iotedge.service └─11952 /usr/bin/iotedged -c /etc/iotedge/config.yaml

Dec 14 09:40:31 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [INFO] - Starting watchdog with 60 second frequency... Dec 14 09:40:31 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [INFO] - Listening on fd://iotedge.mgmt.socket/ with 1 thread for management API. Dec 14 09:40:31 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [INFO] - Listening on fd://iotedge.socket/ with 1 thread for workload API. Dec 14 09:40:31 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [INFO] - Checking edge runtime status Dec 14 09:40:31 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [INFO] - Edge runtime status is stopped, starting module now... Dec 14 09:40:32 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [WARN] - Attempt to start a container failed. Dec 14 09:40:32 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [WARN] - OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown Dec 14 09:40:32 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [WARN] - Error in watchdog when checking for edge runtime status: Dec 14 09:40:32 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [WARN] - A module runtime error occurred. Dec 14 09:40:32 linaro-alip iotedged[11952]: 2018-12-14T09:40:31Z [WARN] - caused by: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown

In the meantime, error log printed in serial port: [50172.148269] docker0: port 1(vethd022998) entered disabled state [50172.159995] device vethd022998 left promiscuous mode [50172.165137] docker0: port 1(vethd022998) entered disabled state [50231.856829] docker0: port 1(vethfee18e1) entered blocking state [50231.862783] docker0: port 1(vethfee18e1) entered disabled state [50231.869971] device vethfee18e1 entered promiscuous mode [50231.876203] IPv6: ADDRCONF(NETDEV_UP): vethfee18e1: link is not ready

  1. Check the docker status again: $ sudo systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2018-12-14 09:40:15 UTC; 23s ago Docs: https://docs.docker.com Main PID: 11745 (dockerd) Tasks: 28 Memory: 29.3M CPU: 1.870s CGroup: /system.slice/docker.service ├─11745 /usr/bin/dockerd -H fd:// └─11780 docker-containerd --config /var/run/docker/containerd/containerd.toml

Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.325620762Z" level=info msg="Docker daemon" commit=9aac187 graphdriver(s)=overlay2 version=dev Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.326853042Z" level=info msg="Daemon has completed initialization" Dec 14 09:40:15 linaro-alip dockerd[11745]: time="2018-12-14T09:40:15.355030963Z" level=info msg="API listen on /var/run/docker.sock" Dec 14 09:40:15 linaro-alip systemd[1]: Started Docker Application Container Engine. Dec 14 09:40:31 linaro-alip dockerd[11745]: time="2018-12-14T09:40:31Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/bd662075acbbd92f9f8f25085db3ee53fd098fc70afe6f858ff78fe0c374006f/shim.sock" debug=false module="containerd/tasks" pid=12025 Dec 14 09:40:31 linaro-alip dockerd[11745]: time="2018-12-14T09:40:31Z" level=info msg="shim reaped" id=bd662075acbbd92f9f8f25085db3ee53fd098fc70afe6f858ff78fe0c374006f module="containerd/tasks" Dec 14 09:40:31 linaro-alip dockerd[11745]: time="2018-12-14T09:40:31.829430106Z" level=error msg="stream copy error: reading from a closed fifo" Dec 14 09:40:31 linaro-alip dockerd[11745]: time="2018-12-14T09:40:31.833336466Z" level=error msg="stream copy error: reading from a closed fifo" Dec 14 09:40:31 linaro-alip dockerd[11745]: time="2018-12-14T09:40:31.998437033Z" level=error msg="bd662075acbbd92f9f8f25085db3ee53fd098fc70afe6f858ff78fe0c374006f cleanup: failed to delete container from containerd: no such container" Dec 14 09:40:32 linaro-alip dockerd[11745]: time="2018-12-14T09:40:31.998743393Z" level=error msg="Handler for POST /containers/edgeAgent/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:402: container init caused \\"could not create session

Context (Environment)

Device (Host) Operating System

Debian stretch

Architecture

arm32

Container Operating System

Runtime Versions

iotedged

iotedge 1.0.4

Edge Agent

Edge Hub

Docker

$ sudo docker version Client: Version: 18.06.0-dev API version: 1.37 Go version: go1.10.2 Git commit: daf021fe Built: Wed Jun 27 00:53:52 2018 OS/Arch: linux/arm Experimental: false Orchestrator: swarm

Server: Engine: Version: dev API version: 1.37 (minimum version 1.12) Go version: go1.10.2 Git commit: 9aac187 Built: Tue Jun 26 23:27:27 2018 OS/Arch: linux/arm Experimental: false

$ sudo docker info Containers: 1 Running: 0 Paused: 0 Stopped: 1 Images: 1 Server Version: dev Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88 runc version: 4fc53a81fb7c994640722ac585fa9ca548971871 init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 4.9.51-imx_4.9.51_imx8m_ga+g6df7474 Operating System: Debian GNU/Linux 9 (stretch) OSType: linux Architecture: aarch64 CPUs: 4 Total Memory: 2.928GiB Name: linaro-alip ID: HOVK:WPKM:RZI2:4AHT:LZ7Q:IGFO:4H45:PZAY:NPD7:INMA:O2E5:F6GL Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

WARNING: No cpu cfs quota support WARNING: No cpu cfs period support

Logs

$ iotedge list NAME STATUS DESCRIPTION CONFIG edgeAgent stopped Stopped mcr.microsoft.com/azureiotedge-agent:1.0

$ journalctl -u iotedge --no-pager --no-full Dec 14 09:39:13 linaro-alip iotedged[5563]: 2018-12-14T09:39:13Z [WARN] - Attempt to start a container failed. Dec 14 09:39:13 linaro-alip iotedged[5563]: 2018-12-14T09:39:13Z [WARN] - OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown Dec 14 09:39:13 linaro-alip iotedged[5563]: 2018-12-14T09:39:13Z [WARN] - Error in watchdog when checking for edge runtime status: Dec 14 09:39:13 linaro-alip iotedged[5563]: 2018-12-14T09:39:13Z [WARN] - A module runtime error occurred. Dec 14 09:39:13 linaro-alip iotedged[5563]: 2018-12-14T09:39:13Z [WARN] - caused by: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown Dec 14 09:40:02 linaro-alip systemd[1]: Stopping Azure IoT Edge daemon... Dec 14 09:40:02 linaro-alip iotedged[5563]: 2018-12-14T09:40:02Z [INFO] - Received SIGTERM, starting shutdown Dec 14 09:40:02 linaro-alip iotedged[5563]: 2018-12-14T09:40:02Z [INFO] - Stopping edge runtime module edgeAgent Dec 14 09:40:02 linaro-alip iotedged[5563]: 2018-12-14T09:40:02Z [WARN] - Attempt to stop a container failed. Dec 14 09:40:02 linaro-alip iotedged[5563]: 2018-12-14T09:40:02Z [WARN] - Container already in this state Dec 14 09:40:02 linaro-alip iotedged[5563]: 2018-12-14T09:40:02Z [ERR!] - Edgelet core error Dec 14 09:40:02 linaro-alip systemd[1]: iotedge.service: Main process exited, code=exited, status=1/FAILURE Dec 14 09:40:02 linaro-alip systemd[1]: Stopped Azure IoT Edge daemon. Dec 14 09:40:02 linaro-alip systemd[1]: iotedge.service: Unit entered failed state. Dec 14 09:40:02 linaro-alip systemd[1]: iotedge.service: Failed with result 'exit-code'. Dec 14 09:40:02 linaro-alip systemd[1]: Started Azure IoT Edge daemon.

darobs commented 5 years ago

Hello @xiaocongfu

Are you able to start containers with the docker command?

xiaocongfu commented 5 years ago

Looks like veth connection can not be established: Dec 17 17:24:53 linaro-alip systemd[1]: resolvconf.service: Cannot add dependency job, ignoring: Unit resolvconf.service is masked. Dec 17 17:24:53 linaro-alip systemd[1]: Starting Azure IoT Edge daemon management socket. Dec 17 17:24:53 linaro-alip systemd[1]: Starting Azure IoT Edge daemon workload socket. Dec 17 17:24:53 linaro-alip systemd[1]: Listening on Azure IoT Edge daemon management socket. Dec 17 17:24:53 linaro-alip systemd[1]: Listening on Azure IoT Edge daemon workload socket. Dec 17 17:24:53 linaro-alip systemd[1]: Started Azure IoT Edge daemon. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Starting Azure IoT Edge Security Daemon Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Version - 1.0.4 (2835ac47fbb944ba628a5cfe812ff394768842e0) Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Using config file: /etc/iotedge/config.yaml Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Using runtime network id azure-iot-edge Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Initializing the module runtime... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Finished initializing the module runtime. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Configuring /var/lib/iotedge as the home directory. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Configuring certificates... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Transparent gateway certificates not found, operating in quick start mode... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Finished configuring certificates. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Initializing hsm... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Finished initializing hsm. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Detecting if configuration file has changed... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - No change to configuration file detected. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Obtaining workload CA succeeded. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Provisioning edge device... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Manually provisioning device "myEdgeDevice8MQ" in hub "nxp-azureIoT-hub.azure-devices.net" Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Finished provisioning edge device. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Starting management API... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Starting workload API... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Starting watchdog with 60 second frequency... Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Listening on fd://iotedge.mgmt.socket/ with 1 thread for management API. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Listening on fd://iotedge.socket/ with 1 thread for workload API. Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Checking edge runtime status Dec 17 17:24:53 linaro-alip iotedged[17904]: 2018-12-17T17:24:53Z [INFO] - Edge runtime status is stopped, starting module now... Dec 17 17:24:53 linaro-alip kernel: [12667.380630] docker0: port 1(veth8f88eee) entered blocking state Dec 17 17:24:53 linaro-alip kernel: [12667.386647] docker0: port 1(veth8f88eee) entered disabled state Dec 17 17:24:53 linaro-alip kernel: [12667.393213] device veth8f88eee entered promiscuous mode Dec 17 17:24:53 linaro-alip kernel: [12667.399077] IPv6: ADDRCONF(NETDEV_UP): veth8f88eee: link is not ready Dec 17 17:24:53 linaro-alip NetworkManager[2953]: [1545067493.9397] manager: (veth584b8ce): new Veth device (/org/freedesktop/NetworkManager/Devices/255) Dec 17 17:24:53 linaro-alip NetworkManager[2953]: [1545067493.9481] manager: (veth8f88eee): new Veth device (/org/freedesktop/NetworkManager/Devices/256) Dec 17 17:24:53 linaro-alip NetworkManager[2953]: [1545067493.9554] devices added (path: /sys/devices/virtual/net/veth584b8ce, iface: veth584b8ce) Dec 17 17:24:53 linaro-alip NetworkManager[2953]: [1545067493.9555] device added (path: /sys/devices/virtual/net/veth584b8ce, iface: veth584b8ce): no ifupdown configuration found. Dec 17 17:24:53 linaro-alip NetworkManager[2953]: [1545067493.9665] devices added (path: /sys/devices/virtual/net/veth8f88eee, iface: veth8f88eee) Dec 17 17:24:53 linaro-alip NetworkManager[2953]: [1545067493.9666] device added (path: /sys/devices/virtual/net/veth8f88eee, iface: veth8f88eee): no ifupdown configuration found. Dec 17 17:24:54 linaro-alip dockerd[5957]: time="2018-12-17T17:24:54Z" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/acf16bce24f8e05ddfbb27238883080579e2935b2a941d11638c1daddfe3aa12/shim.sock" debug=false module="containerd/tasks" pid=17950 Dec 17 17:24:54 linaro-alip dockerd[5957]: time="2018-12-17T17:24:54Z" level=info msg="shim reaped" id=acf16bce24f8e05ddfbb27238883080579e2935b2a941d11638c1daddfe3aa12 module="containerd/tasks" Dec 17 17:24:54 linaro-alip dockerd[5957]: time="2018-12-17T17:24:54.171463968Z" level=error msg="stream copy error: reading from a closed fifo" Dec 17 17:24:54 linaro-alip dockerd[5957]: time="2018-12-17T17:24:54.171439368Z" level=error msg="stream copy error: reading from a closed fifo" Dec 17 17:24:54 linaro-alip kernel: [12667.683096] docker0: port 1(veth8f88eee) entered disabled state Dec 17 17:24:54 linaro-alip kernel: [12667.692474] device veth8f88eee left promiscuous mode Dec 17 17:24:54 linaro-alip kernel: [12667.698052] docker0: port 1(veth8f88eee) entered disabled state Dec 17 17:24:54 linaro-alip NetworkManager[2953]: [1545067494.2472] devices removed (path: /sys/devices/virtual/net/veth584b8ce, iface: veth584b8ce) Dec 17 17:24:54 linaro-alip NetworkManager[2953]: [1545067494.2493] devices removed (path: /sys/devices/virtual/net/veth8f88eee, iface: veth8f88eee) Dec 17 17:24:54 linaro-alip dockerd[5957]: time="2018-12-17T17:24:54.314634654Z" level=error msg="acf16bce24f8e05ddfbb27238883080579e2935b2a941d11638c1daddfe3aa12 cleanup: failed to delete container from containerd: no such container" Dec 17 17:24:54 linaro-alip dockerd[5957]: time="2018-12-17T17:24:54.314833254Z" level=error msg="Handler for POST /containers/edgeAgent/start returned error: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:402: container init caused \\"could not create session key: function not implemented\\"\": unknown" Dec 17 17:24:54 linaro-alip iotedged[17904]: 2018-12-17T17:24:54Z [WARN] - Attempt to start a container failed. Dec 17 17:24:54 linaro-alip iotedged[17904]: 2018-12-17T17:24:54Z [WARN] - OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown Dec 17 17:24:54 linaro-alip iotedged[17904]: 2018-12-17T17:24:54Z [WARN] - Error in watchdog when checking for edge runtime status: Dec 17 17:24:54 linaro-alip iotedged[17904]: 2018-12-17T17:24:54Z [WARN] - A module runtime error occurred. Dec 17 17:24:54 linaro-alip iotedged[17904]: 2018-12-17T17:24:54Z [WARN] - #011caused by: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"could not create session key: function not implemented\"": unknown

xiaocongfu commented 5 years ago

A little information in case you may have the same problem:

I found in this document (https://github.com/moby/moby/blob/master/project/ARM.md) about running moby in 32bit armv7 devices. It suggests to config "dummy" network interface in kernel. I followed the steps and get problem solved.