Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 457 forks source link

IoT Edge Modules are getting recreated if iotedge service restarts #4866

Open niravart7383 opened 3 years ago

niravart7383 commented 3 years ago

Expected Behavior

Modules should be as it is after iotedge restart

Current Behavior

Module containers are getting recreated if iotedge service restarts

Steps to Reproduce

Provide a detailed set of steps to reproduce the bug.

  1. Bring IoT Edge device using DPS (Device Provisioning Service) with TPM as authentication
  2. Deploy custom modules
  3. Check containerId using iotedge list command
  4. Restart iotedge service using Restart-Service iotedge
  5. All the containers will be recreated with difference containerId

Context (Environment)

OS: Windows IoT 1809 (LTSC)

Output of iotedge check

Click here ``` √ config.yaml is well-formed - OK √ config.yaml has well-formed connection string - OK √ container engine is installed and functional - OK √ Windows host version is supported - OK √ config.yaml has correct hostname - OK √ config.yaml has correct URIs for daemon mgmt endpoint - OK ‼ latest security daemon - Warning Installed IoT Edge daemon has version 1.0.10.4 but 1.1.1 is the latest stable version available. Please see https://aka.ms/iotedge-update-runtime for update instructions. √ host time is close to real time - OK √ container time is close to host time - OK √ DNS server - OK √ production readiness: certificates - OK √ production readiness: container engine - OK ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. ‼ production readiness: Edge Agent's storage directory is persisted on the host filesystem - Warning The edgeAgent module is not configured to persist its C:\Windows\Temp\edgeAgent directory on the host filesystem. Data might be lost if the module is deleted or updated. Please see https://aka.ms/iotedge-storage-host for best practices. √ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK Connectivity checks ------------------- √ host can connect to and perform TLS handshake with DPS endpoint - OK √ host can connect to and perform TLS handshake with IoT Hub AMQP port - OK √ host can connect to and perform TLS handshake with IoT Hub HTTPS / WebSockets port - OK √ host can connect to and perform TLS handshake with IoT Hub MQTT port - OK √ container on the IoT Edge module network can connect to IoT Hub AMQP port - OK √ container on the IoT Edge module network can connect to IoT Hub HTTPS / WebSockets port - OK √ container on the IoT Edge module network can connect to IoT Hub MQTT port - OK 19 check(s) succeeded. ```

Device Information

Runtime Versions

Note: when using Windows containers on Windows, run docker -H npipe:////./pipe/iotedge_moby_engine version instead

Logs

edge-agent logs

2021-04-19 17:13:55.011 +00:00 Edge Agent Main()
<6> 2021-04-19 10:13:55.411 -07:00 [INF] - Initializing Edge Agent.
<6> 2021-04-19 10:13:55.726 -07:00 [INF] - Version - 1.0.10.4.37804714 (57772714c81c8b823a5ef05bf11bf343b923fb6a)
<6> 2021-04-19 10:13:55.727 -07:00 [INF] -
        █████╗ ███████╗██╗   ██╗██████╗ ███████╗
       ██╔══██╗╚══███╔╝██║   ██║██╔══██╗██╔════╝
       ███████║  ███╔╝ ██║   ██║██████╔╝█████╗
       ██╔══██║ ███╔╝  ██║   ██║██╔══██╗██╔══╝
       ██║  ██║███████╗╚██████╔╝██║  ██║███████╗
       ╚═╝  ╚═╝╚══════╝ ╚═════╝ ╚═╝  ╚═╝╚══════╝

 ██╗ ██████╗ ████████╗    ███████╗██████╗  ██████╗ ███████╗
 ██║██╔═══██╗╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ██║██║   ██║   ██║       █████╗  ██║  ██║██║  ███╗█████╗
 ██║██║   ██║   ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝
 ██║╚██████╔╝   ██║       ███████╗██████╔╝╚██████╔╝███████╗
 ╚═╝ ╚═════╝    ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

<6> 2021-04-19 10:13:55.793 -07:00 [INF] - Experimental features configuration: {"Enabled":false,"DisableCloudSubscriptions":false}
<6> 2021-04-19 10:13:56.014 -07:00 [INF] - Installing certificates [CN=Azure IoT CA TestOnly Root CA:3/24/2026 6:05:47 AM] to CertificateAuthority
<6> 2021-04-19 10:13:56.234 -07:00 [INF] - Starting metrics listener on Host: *, Port: 9600, Suffix: /metrics
<6> 2021-04-19 10:13:56.490 -07:00 [INF] - Updating performance metrics every 05m:00s
<6> 2021-04-19 10:13:56.496 -07:00 [INF] - Started operation Get system resources
<6> 2021-04-19 10:13:56.498 -07:00 [INF] - Collecting metadata metrics
<6> 2021-04-19 10:13:56.576 -07:00 [INF] - Set metadata metrics: 1.0.10.4.37804714 (57772714c81c8b823a5ef05bf11bf343b923fb6a), {"Enabled":false,"DisableCloudSubscriptions":false}, {"OperatingSystemType":"windows","Architecture":"x86_64","Version":"1.0.10.4 (57772714c81c8b823a5ef05bf11bf343b923fb6a)","Provisioning":{"Type":"dps.tpm","DynamicReprovisioning":false},"ServerVersion":"19.03.12+azure","KernelVersion":"10.0 17763 (17763.1.amd64fre.rs5_release.180914-1434)","OperatingSystem":"Windows 10 Enterprise LTSC 2019 Version 1809 (OS Build 17763.1879)","NumCpus":2,"Virtualized":"unknown"}, True
<6> 2021-04-19 10:13:56.611 -07:00 [INF] - Started operation Checkpoint Availability
<6> 2021-04-19 10:13:56.620 -07:00 [INF] - Started operation refresh twin config
<6> 2021-04-19 10:13:56.645 -07:00 [INF] - Edge agent attempting to connect to IoT Hub via Amqp_Tcp_Only...
<6> 2021-04-19 10:13:57.057 -07:00 [INF] - Created persistent store at C:\Windows\TEMP\edgeAgent
<6> 2021-04-19 10:13:57.118 -07:00 [INF] - Started operation Metrics Scrape
<6> 2021-04-19 10:13:57.118 -07:00 [INF] - Started operation Metrics Upload
Scraping frequency: 01:00:00
Upload Frequency: 1.00:00:00
<6> 2021-04-19 10:13:57.485 -07:00 [INF] - Registering request handler UploadModuleLogs
<6> 2021-04-19 10:13:57.486 -07:00 [INF] - Registering request handler GetModuleLogs
<6> 2021-04-19 10:13:57.486 -07:00 [INF] - Registering request handler UploadSupportBundle
<6> 2021-04-19 10:13:57.486 -07:00 [INF] - Registering request handler RestartModule
<6> 2021-04-19 10:13:59.548 -07:00 [INF] - Edge agent connected to IoT Hub via Amqp_Tcp_Only.
<6> 2021-04-19 10:14:00.261 -07:00 [INF] - Initialized new module client with subscriptions enabled
<6> 2021-04-19 10:14:00.572 -07:00 [INF] - Obtained Edge agent twin from IoTHub with desired properties version 16 and reported properties version 29.
<6> 2021-04-19 10:14:02.647 -07:00 [INF] - Plan execution started for deployment 16
<6> 2021-04-19 10:14:02.680 -07:00 [INF] - Executing command: "Command Group: (\n  [Create module edgeHub]\n  [Start module edgeHub]\n)"
<6> 2021-04-19 10:14:02.686 -07:00 [INF] - Executing command: "Create module edgeHub"
<6> 2021-04-19 10:14:03.586 -07:00 [INF] - Executing command: "Start module edgeHub"
<6> 2021-04-19 10:14:04.706 -07:00 [INF] - Executing command: "Command Group: (\n  [Create module ddiotedgeremoteaccessmodule]\n  [Start module ddiotedgeremoteaccessmodule]\n)"
<6> 2021-04-19 10:14:04.706 -07:00 [INF] - Executing command: "Create module ddiotedgeremoteaccessmodule"

Additional Information

It is applicable to TPM auth with DPS only, It is working fine with SAS token authentication

lfitchett commented 3 years ago

@niravart7383 Can you confirm that your iotedged version matches your edgeAgent. Run iotedge version

niravart7383 commented 3 years ago

@niravart7383 Can you confirm that your iotedged version matches your edgeAgent. Run iotedge version

Yes, it matches exactly

lfitchett commented 3 years ago

Hey @niravart7383, sorry for the delay. I can confirm that this is expected behavior. This is because restarting triggers the deprovisioning flow, which for DPS with TPM results in a new identity.

If you want to avoid this, you can set always_reprovision_on_startup to false: https://github.com/Azure/iotedge/blob/d2c331d605a846911019364a31a7d098e1e2fc45/edgelet/iotedge/test-files/config/dps-tpm/old-config.yaml#L8

If you update to the LTS 1.1.x, the field is now AutoReprovisioningMode, and can be set to Dynamic, AlwaysOnStartup, and OnErrorOnly.

niravart7383 commented 3 years ago

Hi

I will check the same and let you know. Will it stop reprovisioning also during startup if the flag will be false? If it really stops reprovisioning then there may be a security threat as we are not reaching to DPS anymore during startup

Isn't it?

niravart7383 commented 3 years ago

Hey @niravart7383, sorry for the delay. I can confirm that this is expected behavior. This is because restarting triggers the deprovisioning flow, which for DPS with TPM results in a new identity.

If you want to avoid this, you can set always_reprovision_on_startup to false:

https://github.com/Azure/iotedge/blob/d2c331d605a846911019364a31a7d098e1e2fc45/edgelet/iotedge/test-files/config/dps-tpm/old-config.yaml#L8

If you update to the LTS 1.1.x, the field is now AutoReprovisioningMode, and can be set to Dynamic, AlwaysOnStartup, and OnErrorOnly.

I have tested the same by setting flag always_reprovision_on_startup to true I have checked multiple times and everytime it recreates the container where we are losing our data and files.

I have attached two screenshots, where you can find the containerIds are different.

image image

lfitchett commented 3 years ago

Hey @niravart7383, the always_reprovision_on_startup needs to be set to false. It is the re-provisioning that is causing the containers to be reset.

Sorry if the config file I linked above caused confusion, I simply linked to an example config in our repo that had the field.

lfitchett commented 3 years ago

In addition, if you are worried about losing data in a container, you can use volume mounting to store permanent files on the host filesystem: https://docs.docker.com/storage/volumes/

niravart7383 commented 3 years ago

Hey @niravart7383, the always_reprovision_on_startup needs to be set to false. It is the re-provisioning that is causing the containers to be reset.

Sorry if the config file I linked above caused confusion, I simply linked to an example config in our repo that had the field.

I have checked with both, true/false the behavior is same.

niravart7383 commented 3 years ago

In addition, if you are worried about losing data in a container, you can use volume mounting to store permanent files on the host filesystem: https://docs.docker.com/storage/volumes/

I agree !!

But why the behaviour is different, the same thing is not happening If I am not using TPM

github-actions[bot] commented 3 years ago

This issue is being marked as stale because it has been open for 30 days with no activity.