Azure / iot-edge-v1

Azure IoT Edge
http://azure.github.io/iot-edge/
Other
524 stars 258 forks source link

[V2] Windows container sometimes doesn't sync time with host, cause edgeHub failed to connect to IoT Hub. #565

Open yphuangms opened 6 years ago

yphuangms commented 6 years ago

IF running windows container for IoT Edge runtime and modules, sometimes the container doesn't sync time with host. This time sync issue is really dynamic, it occurs sometimes, not always. Once it happens, restarting docker service won't help. It requires machine reboot to recover.

Since edge runtime requires secure connection with IoT Hub, hence the time synchronization of IoT Hub DeviceClient does matter. If the client device time goes ahead (or behind) too much, IoT Hub will reject such connection from client. Then we'll got operation timeout at client side.

Is there any way to enforce time sync with host on Windows container ? Or can edge runtime enforce/ensure this time sync with host ?

There are many discussion on this "time sync" topic for Linux container, but unfortunately, I cannot find one for "Windows container". Even if there are ways to fix time sync issue in Windows container, we might not able to touch the code or image of edge runtime.

yphuangms commented 6 years ago

Here is the log which shows time sync issue:

C:\Tools>docker system info
Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 30
Server Version: 17.09.0-ce
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: inactive
Default Isolation: hyperv
Kernel Version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)
Operating System: Windows 10 Enterprise
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 15.8GiB
Name: mycomputer
Docker Root Dir: C:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 30
 System Time: 2018-03-30T13:52:17.633296+08:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

C:\Tools>docker logs edgeAgent
2018-03-31 04:09:48.346 +08:00 [INF] - Starting module management agent.
2018-03-31 04:09:48.504 +08:00 [INF] - Version - 1.0.0-preview022.11567621 (12a8e1bb63e619b17ca685efd470ad3f412034f4)
2018-03-31 04:09:48.504 +08:00 [INF] -
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

2018-03-31 04:09:48.587 +08:00 [INF] - Edge agent attempting to connect to IoT Hub via AMQP...
2018-03-31 04:09:49.462 +08:00 [INF] - Created persistent store at C:\Users\ContainerAdministrator\AppData\Local\Temp\edgeAgent
2018-03-31 04:09:50.862 +08:00 [INF] - Edge agent connected to IoT Hub via AMQP.
2018-03-31 04:09:52.826 +08:00 [INF] - Plan execution started for deployment 31
2018-03-31 04:09:52.854 +08:00 [INF] - Executing command: "Command Group: (
  [docker pull microsoft/azureiotedge-hub:1.0.0-preview021]
  [docker create --name edgeHub microsoft/azureiotedge-hub:1.0.0-preview021]
  [docker start edgeHub]
)"
2018-03-31 04:09:52.856 +08:00 [INF] - Executing command: "docker pull microsoft/azureiotedge-hub:1.0.0-preview021"

C:\Tools>docker system info
Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 30
Server Version: 17.09.0-ce
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: ics l2bridge l2tunnel nat null overlay transparent
 Log: awslogs etwlogs fluentd json-file logentries splunk syslog
Swarm: inactive
Default Isolation: hyperv
Kernel Version: 10.0 16299 (16299.15.amd64fre.rs3_release.170928-1534)
Operating System: Windows 10 Enterprise
OSType: windows
Architecture: x86_64
CPUs: 8
Total Memory: 15.8GiB
Name: mycomputer
Docker Root Dir: C:\ProgramData\Docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 42
 System Time: 2018-03-30T13:52:30.4753003+08:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

C:\Tools>
C:\Tools\iotedge.temp>docker logs edgeHub
...
2018-03-31 04:52:32.279 +08:00 [INF] - Initializing configuration
...
2018-03-31 04:54:42.858 +08:00 [ERR] - Error creating or updating the cloud proxy for client IoTEdgeDevTest/insights

System.AggregateException: One or more errors occurred. (Operation timeout expired.) (Operation timeout expired.) ---> System.TimeoutException: Operation timeout expired.
   at Microsoft.Azure.Devices.Client.DeviceClient.<>c.<ApplyTimeout>b__79_2(Task t)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.<CreateAndOpenDeviceClient>d__23.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 174
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Util.Fallback.<ExecuteAsync>d__3`1.MoveNext()
   --- End of inner exception stack trace ---
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.<ConnectToIoTHub>d__22.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 153
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.<GetCloudProxyAsync>d__21.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 146
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.<>c__DisplayClass20_0.<<CreateOrUpdateAsync>b__2>d.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 115
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.<CreateOrUpdateAsync>d__20.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 86
---> (Inner Exception #0) System.TimeoutException: Operation timeout expired.
   at Microsoft.Azure.Devices.Client.DeviceClient.<>c.<ApplyTimeout>b__79_2(Task t)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Hub.CloudProxy.CloudConnection.<CreateAndOpenDeviceClient>d__23.MoveNext() in /opt/vsts/work/1/s/edge-hub/src/Microsoft.Azure.Devices.Edge.Hub.CloudProxy/CloudConnection.cs:line 174
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.Devices.Edge.Util.Fallback.<ExecuteAsync>d__3`1.MoveNext()<---
darobs commented 6 years ago

Thank you for reporting this. I'm adding this as a bug to be evaluated by the team.

68muscle commented 6 years ago

This question is directly related to this topic. I have a very basic Toshiba laptop that is running Docker for Windows and Microsoft IoT Edge runtime. The laptop is connected to the Internet via a USB-connected wireless hot spot. I have three Edge modules running in their own Docker containers: $edgeHub, $edgeAgent, and a custom module. Everything was working perfectly yesterday, then $edgeHub and the custom module will no longer start. When I view the logs of $edgeHub, I see "Error creating or updating the cloud proxy for client ...". I figure the custom module won't start because of this error. Does anyone have any suggestions on how to get past this?

A colleague has suggested that restarting the Docker daemon/Windows Service may work. He also said redeploying the module might work. Since $edgeHub is a system module, I don't know how to redeploy it.

RESOLVED

What worked for me was to open a command window and issue the following commands:

docker rm -f edgeAgent docker rm -f edgeHub docker rm -f

I then restarted the Edge application on the laptop, and the app was back. I verified through the portal UI that they were running.