Azure / iotedge

The IoT Edge OSS project
MIT License
1.46k stars 460 forks source link

Unable to set "homedir_path" through config.toml #7082

Closed buentead closed 1 year ago

buentead commented 1 year ago

We update the gateway kernel and rootfs by switching A/B partitions. Hence, we have to copy the content of the directory "/var/lib/aziot" between the partitions before switching. In order to avoid this directory copy I'd like to move the content of "/var/lib/aziot" permanently to another (optionally encrypted) partition, e.g. "/data/aziot". So the IoTEdge settings are kept when swapping rootfs partitions.

Expected Behavior

Adding homedir_path settings to "/etc/aziot/config.toml" for "certd", "keyd", "identityd", and "edged". Following an example for "keyd":

[aziot_keys]
homedir_path = "/data/aziot/keyd"

Then apply new configuration with iotedge config apply. From now on, the new directory "/data/aziot/..." must be used instead of "/var/lib/aziot/..."

Current Behavior

Even after above mentioned configuration change, IoTEdge still refers to "/var/lib/aziot/...". The configuration changes were not taken into account

Steps to Reproduce

Following is the example with "keyd". But the same applies to "certd", "identityd", end "edged" as well.

  1. Make sure you have a running IoTEdge agent using "/var/lib/aziot" as the homdir_path.
  2. mkdir -p /data/aziot
  3. rsync -ptgo -A -X -r /var/lib/aziot/keyd /data/aziot/
  4. Make sure the following two lines are included in "/etc/aziot/config.toml"
    [aziot_keys]
    homedir_path = "/data/aziot/keyd"
  5. iotedge system stop
  6. mv /var/lib/aziot/keyd /var/lib/aziot/keyd-bak
  7. ìotedge config apply
  8. IoTEdge will not start, as it still looks for the homedir_path "/var/lib/aziot/keyd" instead of "/data/aziot/keyd"

Context (Environment)

Output of iotedge check

Click here ``` Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK ‼ aziot-identity-service package is up-to-date - Warning Installed aziot-identity-service package has version 1.4.4 but 1.4.5 is the latest stable version available. Please see https://aka.ms/aziot-update-runtime for update instructions. √ host time is close to reference time - OK ‼ production readiness: identity certificates expiry - Warning DPS identity 'device-id' will expire soon (2023-08-14 05:51:55 UTC, in 0 days) √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ identityd is running - OK √ read all preloaded certificates from the Certificates Service - OK √ read all preloaded key pairs from the Keys Service - OK √ check all EST server URLs utilize HTTPS - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- ‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. √ host can connect to and perform TLS handshake with DPS endpoint - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK × configuration has correct URIs for daemon mgmt endpoint - Error SocketError - SocketErrorCode (TimedOut) : Operation timed out One or more errors occurred. (Got bad response: ) ‼ aziot-edge package is up-to-date - Warning Installed IoT Edge daemon has version 1.4.10 but 1.4.16 is the latest stable version available. Please see https://aka.ms/iotedge-update-runtime for update instructions. √ container time is close to host time - OK ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. √ production readiness: Edge Agent's storage directory is persisted on the host filesystem - OK √ production readiness: Edge Hub's storage directory is persisted on the host filesystem - OK √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- 23 check(s) succeeded. 8 check(s) raised warnings. Re-run with --verbose for more details. 1 check(s) raised errors. Re-run with --verbose for more details. 7 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details. ```

Device Information

Runtime Versions

Logs

aziot journal logs ``` Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: Closed Azure IoT Identity Service API socket. Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: Stopping Azure IoT Identity Service API socket... Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: Starting Azure IoT Identity Service API socket... Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: Listening on Azure IoT Identity Service API socket. Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: Started Azure IoT Identity Service. Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [INFO] - Starting service... Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [INFO] - Version - 1.4.4 Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [INFO] - Loaded openssl'd Default provider Aug 13 13:40:48 g2-pf2x1qlx aziot-certd[32708]: 2023-08-13T13:40:48Z [INFO] - <-- GET /certificates/device-id?api-version=2020-09-01 {"host": "certd.sock"} Aug 13 13:40:48 g2-pf2x1qlx aziot-certd[32708]: 2023-08-13T13:40:48Z [INFO] - --> 200 {"content-type": "application/json"} Aug 13 13:40:48 g2-pf2x1qlx aziot-certd[32708]: 2023-08-13T13:40:48Z [INFO] - <-- GET /certificates/device-id?api-version=2020-09-01 {"host": "certd.sock"} Aug 13 13:40:48 g2-pf2x1qlx aziot-certd[32708]: 2023-08-13T13:40:48Z [INFO] - --> 200 {"content-type": "application/json"} Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [INFO] - Certificate device-id will be auto-renewed. Next renewal at 2023-08-14T01:03:55+00:00. Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [INFO] - Provisioning starting. Reason: Startup Aug 13 13:40:48 g2-pf2x1qlx aziot-certd[32708]: 2023-08-13T13:40:48Z [INFO] - <-- GET /certificates/device-id?api-version=2020-09-01 {"host": "certd.sock"} Aug 13 13:40:48 g2-pf2x1qlx aziot-certd[32708]: 2023-08-13T13:40:48Z [INFO] - --> 200 {"content-type": "application/json"} Aug 13 13:40:48 g2-pf2x1qlx aziot-keyd[32714]: 2023-08-13T13:40:48Z [INFO] - <-- GET /keypair/device-id?api-version=2021-05-01 {"host": "keyd.sock"} Aug 13 13:40:48 g2-pf2x1qlx aziot-keyd[32714]: 2023-08-13T13:40:48Z [ERR!] - Permission denied (os error 13) Aug 13 13:40:48 g2-pf2x1qlx aziot-keyd[32714]: 2023-08-13T13:40:48Z [ERR!] - !!! internal error Aug 13 13:40:48 g2-pf2x1qlx aziot-keyd[32714]: 2023-08-13T13:40:48Z [ERR!] - !!! caused by: could not load key pair Aug 13 13:40:48 g2-pf2x1qlx aziot-keyd[32714]: 2023-08-13T13:40:48Z [ERR!] - !!! caused by: could not load key pair: AZIOT_KEYS_RC_ERR_EXTERNAL Aug 13 13:40:48 g2-pf2x1qlx aziot-keyd[32714]: 2023-08-13T13:40:48Z [INFO] - --> 500 {"content-type": "application/json"} Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [ERR!] - Failed to provision with IoT Hub, and no valid device backup was found: internal error Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [ERR!] - service encountered an error Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [ERR!] - caused by: internal error Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [ERR!] - caused by: could not create certificate Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [ERR!] - caused by: failed to get identity cert key Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 2023-08-13T13:40:48Z [ERR!] - 0: Aug 13 13:40:48 g2-pf2x1qlx aziot-identityd[33603]: 1: Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: aziot-identityd.service: Main process exited, code=exited, status=1/FAILURE Aug 13 13:40:48 g2-pf2x1qlx systemd[1]: aziot-identityd.service: Failed with result 'exit-code'. ```
edge-agent logs ``` No edge-agent log as the service doesn't start. ```
edge-hub logs ``` No edge-hub log as the service doesn't start. ```

Additional Information

It would be great, if the homedir_path of all services could be changed with one configuration entry as well.

ancaantochi commented 1 year ago

The error is "Permission denied (os error 13)", it looks like the new path is used and it doesn't have the required permissions.

You can set permissions as below:

chown aziotcs:aziotcs /data/aziot/certd
chown aziotid:aziotid /data/aziot/identityd
chown aziotks:aziotks /data/aziot/keyd
buentead commented 1 year ago

Thank you for your thoughts. But I think the message shown in the log is misleading. I copied all files with owner and permission:

> ll /data/aziot/keyd
total 12
drwx------ 3 aziotks aziotks 4096 Aug 12 10:39 ./
drwxr-xr-x 6 root    root    4096 Aug 14 17:12 ../
drwxr-xr-x 2 aziotks aziotks 4096 Aug 14 06:59 keys/

But as I renamed the directory /var/lib/aziot/keyd to /var/lib/aziot/keyd.bak it cannot find the keys anymore. And if I revert this step, then IoTEdge starts normaly. So I concluded that the configuration change of homedir_path wasn't taken into account.

jlian commented 1 year ago

@ancaantochi could be a bug?

ancaantochi commented 1 year ago

I am able to reproduce the issue, I will look into the root cause

ancaantochi commented 1 year ago

the homedir_path gets overridden with the default value when running iotedge config apply So to change the path you would need to update homedir_path in : /etc/aziot/keyd/config.d/00-super.toml and then restart service: sudo iotedge system restart and NOT run iotedge config apply afterwards

Because the steps above could lead into errors in the future if config apply is run for some other changes, would be better to mount the partition to /var/lib/aziot

buentead commented 1 year ago

Because the steps above could lead into errors in the future if config apply is run for some other changes, would be better to mount the partition to /var/lib/aziot

Thank you, for the proposal. This could be a possible workaround. I will check, how this would work on the gateway and how I can include everything in a "GoldenImage" so it will work 'out-of-the-box'.

buentead commented 1 year ago

Because the steps above could lead into errors in the future if config apply is run for some other changes, would be better to mount the partition to /var/lib/aziot

I confirm that mounting a partition to /var/lib/aziot works. In addition, I tested to use 'gocryptfs' to mount an encrypted folder (to protect the private keys 'at rest'). Thanks, ancaantochi for the hint.

Although I have a good workaround, it would be nice to see a fix in the future.

jlian commented 1 year ago

We will close this issue for now as there's a good workarouund