Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 458 forks source link

Agent doesn't support Docker Health Checks (Part 2) #1183

Closed arabold closed 5 years ago

arabold commented 5 years ago

This is a follow-up bug on https://github.com/Azure/iotedge/issues/1116 - Docker health checks are still not 100% working despite a fix released in 1.0.7

While it appears that health checks defined in Dockerfile are working as expected now, there's still a problem when defining them in the deployment.template.json as follows:

          "redis": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "redis:latest",
              "createOptions": {
                "Healthcheck": {
                  "Test": [ "CMD-SHELL", "redis-cli ping || exit 1" ],
                  "Interval": 15000000000,
                  "Timeout": 5000000000,
                  "Retries": 1,
                  "StartPeriod": 300000000000
                },
                "HostConfig": {
                  "Mounts": [
                    {
                      "Target": "/data",
                      "Source": "redisDataVolume",
                      "Type": "volume"
                    }
                  ]
                }
              }
            }
          },

The edgeAgent will report errors as shown below. This still seems a Int32 vs Int64 issue.

Expected Behavior

Health checks should work following the Docker specification.

Current Behavior

[2019-05-07 16:40:29 : Starting Edge Agent
[05/07/2019 04:40:29.959 PM] Edge Agent Main()
2019-05-07 16:40:30.136 +00:00 [INF] - Starting module management agent.
2019-05-07 16:40:30.289 +00:00 [INF] - Version - 1.0.7.21905529 (f455ae2cd66db716c4bd8e7aaa7984ce092481c0)
2019-05-07 16:40:30.289 +00:00 [INF] - 
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

2019-05-07 16:40:30.410 +00:00 [INF] - Started operation refresh twin config
2019-05-07 16:40:30.438 +00:00 [INF] - Edge agent attempting to connect to IoT Hub via Amqp_Tcp_Only...
2019-05-07 16:40:31.525 +00:00 [INF] - Created persistent store at /tmp/edgeAgent
2019-05-07 16:40:31.644 +00:00 [INF] - Edge agent connected to IoT Hub via Amqp_Tcp_Only.
2019-05-07 16:40:31.858 +00:00 [INF] - Obtained Edge agent twin from IoTHub with desired properties version 85 and reported properties version 15.
2019-05-07 16:40:32.686 +00:00 [INF] - Plan execution started for deployment 85
...
2019-05-07 16:41:23.061 +00:00 [INF] - Executing command: "Create module redis"
2019-05-07 16:41:23.062 +00:00 [ERR] - Executing command for operation ["create"] failed.
Microsoft.Azure.Devices.Edge.Agent.Edgelet.EdgeletCommunicationException- Message:Error calling Create module redis: Request body is malformed
    caused by: invalid value: integer `15000000000`, expected i32, StatusCode:400, at:   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2019_01_30.ModuleManagementHttpClient.HandleException(Exception exception, String operation) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/version_2019_01_30/ModuleManagementHttpClient.cs:line 194
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Versioning.ModuleManagementHttpClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/versioning/ModuleManagementHttpClientVersioned.cs:line 94
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2019_01_30.ModuleManagementHttpClient.CreateModuleAsync(ModuleSpec moduleSpec) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/version_2019_01_30/ModuleManagementHttpClient.cs:line 96
   at Microsoft.Azure.Devices.Edge.Agent.Core.LoggingCommandFactory.LoggingCommand.ExecuteAsync(CancellationToken token) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/LoggingCommandFactory.cs:line 60
2019-05-07 16:41:23.063 +00:00 [ERR] - Executing command for operation ["Command Group: (
  [Create module redis]
  [Start module redis]
)"] failed.
Microsoft.Azure.Devices.Edge.Agent.Edgelet.EdgeletCommunicationException- Message:Error calling Create module redis: Request body is malformed
    caused by: invalid value: integer `15000000000`, expected i32, StatusCode:400, at:   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2019_01_30.ModuleManagementHttpClient.HandleException(Exception exception, String operation) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/version_2019_01_30/ModuleManagementHttpClient.cs:line 194
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Versioning.ModuleManagementHttpClientVersioned.Execute[T](Func`1 func, String operation) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/versioning/ModuleManagementHttpClientVersioned.cs:line 94
   at Microsoft.Azure.Devices.Edge.Agent.Edgelet.Version_2019_01_30.ModuleManagementHttpClient.CreateModuleAsync(ModuleSpec moduleSpec) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Edgelet/version_2019_01_30/ModuleManagementHttpClient.cs:line 96
   at Microsoft.Azure.Devices.Edge.Agent.Core.LoggingCommandFactory.LoggingCommand.ExecuteAsync(CancellationToken token) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/LoggingCommandFactory.cs:line 60
   at Microsoft.Azure.Devices.Edge.Agent.Core.Commands.GroupCommand.ExecuteAsync(CancellationToken token) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/commands/GroupCommand.cs:line 35
   at Microsoft.Azure.Devices.Edge.Agent.Core.LoggingCommandFactory.LoggingCommand.ExecuteAsync(CancellationToken token) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/LoggingCommandFactory.cs:line 60
2019-05-07 16:41:23.063 +00:00 [ERR] - Step failed in deployment 85, continuing execution. Failure when running command Command Group: (
  [Create module redis]
  [Start module redis]
). Will retry in 00s.

Steps to Reproduce

Add health checks to a module like shown in the example above. Running this in the simulator will work as expected. However, deploying it on an actual Edge Server will fail with above errors.

Context (Environment)

Device (Host) Operating System

Ubuntu 18.04.2 LTS

Architecture

amd64

Container Operating System

Runtime Versions

iotedged

iotedged 1.0.7 (f455ae2cd66db716c4bd8e7aaa7984ce092481c0)

Edge Agent

Version - 1.0.7.21905529 (f455ae2cd66db716c4bd8e7aaa7984ce092481c0)

Edge Hub

Version - 1.0.7.21905529 (f455ae2cd66db716c4bd8e7aaa7984ce092481c0)

Docker

Client:
 Version:           3.0.5
 API version:       1.40
 Go version:        go1.12.1
 Git commit:        ba9934d4
 Built:             Thu Apr 18 22:01:41 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          3.0.5
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.1
  Git commit:       dbe4a30
  Built:            Thu Apr 18 22:07:58 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.5
  GitCommit:        bb71b10fd8f58240ca47fbb579b9d1028eea7c84
 runc:
  Version:          1.0.0-rc6+dev
  GitCommit:        2b18fe1d885ee5083ef9f0838fee39b62d653e30
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Logs

See above

Additional Information

n/a

arabold commented 5 years ago

Update: Setting the health check through Dockerfile doesn't work either. It fails during updating reported properties:

2019-05-07 17:24:35.382 +00:00 [INF] - Updated reported properties
2019-05-07 17:24:58.714 +00:00 [WRN] - Reconcile failed because of the an exception
Microsoft.Azure.Devices.Edge.Agent.Edgelet.EdgeletCommunicationException- Message:Error calling List modules: Could not list modules
    caused by: Could not query module runtime state
    caused by: Container runtime error
    caused by: invalid value: integer `15000000000`, expected i32 at line 1 column 5447, StatusCode:500, at:   at Microsoft.Azure.Devices.Edge.Agent.Core.Agent.ReconcileAsync(CancellationToken token) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/Agent.cs:line 116
2019-05-07 17:24:58.903 +00:00 [INF] - Updated reported properties
2019-05-07 17:25:21.331 +00:00 [WRN] - Reconcile failed because of the an exception
Microsoft.Azure.Devices.Edge.Agent.Edgelet.EdgeletCommunicationException- Message:Error calling List modules: Could not list modules
    caused by: Could not query module runtime state
    caused by: Container runtime error
    caused by: invalid value: integer `15000000000`, expected i32 at line 1 column 5723, StatusCode:500, at:   at Microsoft.Azure.Devices.Edge.Agent.Core.Agent.ReconcileAsync(CancellationToken token) in /home/vsts/work/1/s/edge-agent/src/Microsoft.Azure.Devices.Edge.Agent.Core/Agent.cs:line 116
ancaantochi commented 5 years ago

Hi @arabold,

The fix was not part of 1.0.7 release, it will be in 1.0.8 release.

myagley commented 5 years ago

A release candidate for 1.0.8 was just released. You can try it out using the 1.0.8-rc1 tag on your images. Packages for the daemon are located here: https://github.com/Azure/azure-iotedge/releases/tag/1.0.8-rc1

arabold commented 5 years ago

I'm testing this version since a couple of days and after some initial hiccup with me installing the wrong build, this works as expected now. I'll report back if I see any unusual behavior but for now the health checks work very well.

Thank you for the update! Much appreciated.

arsing commented 5 years ago

1.0.8 is released.