Azure / iotedge

The IoT Edge OSS project
MIT License
1.45k stars 458 forks source link

TPM Provisioning doesn't work for iotedge version 1.4 unless upgrade from version 1.3 #6658

Closed SandroVaroli closed 1 year ago

SandroVaroli commented 1 year ago

Expected Behavior

With version 1.3 we are able to deploy iotedge on a clean installation of ubuntu device with the next script using TPM provisioning:

#!/bin/bash
# Variable
SCOPE_ID=0ne000XXXXX
REGISTRATION_ID=YYYYYYYYYYYY

#Distro management
UBUNTU20=20.04
UBUNTU18=18.04/multiarch

DISTRO="$UBUNTU20"

# Prerequisites
wget https://packages.microsoft.com/config/ubuntu/$DISTRO/packages-microsoft-prod.deb -O packages-microsoft-prod.deb

sudo dpkg -i packages-microsoft-prod.deb
rm packages-microsoft-prod.deb

# Installation forcing version 1.3.0
sudo apt-get update
sudo apt-get install moby-engine
sudo apt-get install aziot-identity-service=1.3.0-1 aziot-edge=1.3.0-1

# Provisioning
STARTLINE=$(sudo awk '/DPS provisioning with TPM/{ print NR; exit }' /etc/aziot/config.toml.edge.template)
ENDLINE=$((STARTLINE+9))    

sudo sed "$STARTLINE"','"$ENDLINE"'s:^# ::' /etc/aziot/config.toml.edge.template > config.toml
sed -i 's:^id_scope.*:id_scope = "'"$SCOPE_ID"'":' config.toml
sed -i 's:^registration_id.*:registration_id = "'"$REGISTRATION_ID"'":' config.toml
sudo cp ./config.toml /etc/aziot/
rm ./config.toml

# Give IoT Edge access to the TPM
echo '# allow aziottpm access to tpm0 and tpmrm0' >> tpmaccess.rules
echo 'KERNEL=="tpm0", SUBSYSTEM=="tpm", OWNER="aziottpm", MODE="0660"' >> tpmaccess.rules
echo 'KERNEL=="tpmrm0", SUBSYSTEM=="tpmrm", OWNER="aziottpm", MODE="0660"' >> tpmaccess.rules

sudo cp -f tpmaccess.rules /etc/udev/rules.d/
rm tpmaccess.rules

/bin/udevadm trigger --subsystem-match=tpm --subsystem-match=tpmrm

# Apply configuration changes
sudo iotedge config apply

and this is the result in the /etc/aziot/config.toml file

## DPS provisioning with TPM
[provisioning]
source = "dps"
global_endpoint = "https://global.azure-devices-provisioning.net"
id_scope = "0ne000XXXXX"
#
[provisioning.attestation]
method = "tpm"
registration_id = "YYYYYYYYYYYY"

And this the result of iotedge system status

> sudo iotedge system status
System services:
    aziot-edged             Running
    aziot-identityd         Running
    aziot-keyd              Running
    aziot-certd             Running
    aziot-tpmd              Running

Use 'iotedge system logs' to check for non-fatal errors.
Use 'iotedge check' to diagnose connectivity and configuration issues.

Of course on the DPS an individual enrollment with TPM method and the right Endorsement key has been prepared in advance.

Current Behavior

With version 1.4 the config.toml file slightly changed the format adding 2 rows in the TPM provisioning section where it is possible to send a custom payload during DPS registration... We have no need to do this (I suppose, since no change in DPS) so let them commented out. The script is identical with few changes on how config.toml is craeted (temporary solution... in the future will manage better) and of course no version forced. Here the differences

# Installation using last version 1.4
sudo apt-get update
sudo apt-get install moby-engine
sudo apt-get install aziot-edge

# Provisioning with new config.toml format
STARTLINE=$(sudo awk '/DPS provisioning with TPM/{ print NR; exit }' /etc/aziot/config.toml.edge.template)
ENDLINE1=$((STARTLINE+4))
STARTLINE2=$((ENDLINE1+4))
ENDLINE=$((STARTLINE2+3))

sudo sed "$STARTLINE"','"$ENDLINE1"'s:^# ::' /etc/aziot/config.toml.edge.template > config.toml
sed -i "$STARTLINE2"','"$ENDLINE"'s:^# ::' config.toml
sed -i 's:^id_scope.*:id_scope = "'"$SCOPE_ID"'":' config.toml
sed -i 's:^registration_id.*:registration_id = "'"$REGISTRATION_ID"'":' config.toml
sudo cp ./config.toml /etc/aziot/
rm ./config.toml

and this is the result in the /etc/aziot/config.toml file

## DPS provisioning with TPM
[provisioning]
source = "dps"
global_endpoint = "https://global.azure-devices-provisioning.net"
id_scope = "0ne000XXXXX"
#
## Uncomment to send a custom payload during DPS registration
# payload = { uri = "file:///var/secrets/aziot/identityd/dps-additional-data.json" }
#
[provisioning.attestation]
method = "tpm"
registration_id = "YYYYYYYYYYYY"

Ok, this looks like perfectly identical to the one of version 1.3... greats.. it shuold work... Also everything is following documentation But if running on Ubuntu clean installation, it doesn't works

Here the result of iotedge systen status

sudo iotedge system status
System services:
    aziot-edged             Running
    aziot-identityd         Running
    aziot-keyd              Ready
    aziot-certd             Ready
    aziot-tpmd              Down - activating

aziot-tpmd is in a bad state because:
aziot-tpmd.service: Down - activating : Printing the last 10 log lines.
-- Logs begin at Mon 2022-09-05 15:29:55 UTC, end at Mon 2022-09-05 15:44:39 UTC. --
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: ERROR:tcti:src/tss2-tcti/tctildr-dl.c:150:tcti_from_file() Could not initialize TCTI file: device
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: ERROR:tcti:src/tss2-tcti/tctildr.c:418:Tss2_TctiLdr_Initialize_Ex() Failed to instantiate TCTI
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: 2022-09-05T15:44:39Z [ERR!] - service encountered an error
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: 2022-09-05T15:44:39Z [ERR!] - caused by: internal error
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: 2022-09-05T15:44:39Z [ERR!] - caused by: could not initialize TPM
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: 2022-09-05T15:44:39Z [ERR!] - caused by: tcti:IO failure
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]: 2022-09-05T15:44:39Z [ERR!] -    0: <unknown>
Sep 05 15:44:39 edgelinux aziot-tpmd[4228]:    1: <unknown>
Sep 05 15:44:39 edgelinux systemd[1]: aziot-tpmd.service: Main process exited, code=exited, status=1/FAILURE
Sep 05 15:44:39 edgelinux systemd[1]: aziot-tpmd.service: Failed with result 'exit-code'.

Use 'iotedge system logs' to check for non-fatal errors.
Use 'iotedge check' to diagnose connectivity and configuration issues. 

Steps to Reproduce

  1. On DPS create an Individual enrolment with TPM method and correct endorsement key retrieved by physical device
  2. New clean installation of Ubuntu server 20.04TLS with security update on phisical device
  3. Install iotedge following next steps (as described in documentation and shown is script above)
    sudo apt-get update
    sudo apt-get install moby-engine
    sudo apt-get install aziot-edge
  4. Give IoT Edge access to the TPM as described in documentation and in the script above
  5. Apply changes (sudo iotedge config apply) in order to provision the iotedge and reboot in order to ensure everything starts clean

Context (Environment)

Output of iotedge check

Click here ``` iotedge check Configuration checks (aziot-identity-service) --------------------------------------------- √ keyd configuration is well-formed - OK √ certd configuration is well-formed - OK √ tpmd configuration is well-formed - OK √ identityd configuration is well-formed - OK √ daemon configurations up-to-date with config.toml - OK √ identityd config toml file specifies a valid hostname - OK √ aziot-identity-service package is up-to-date - OK √ host time is close to reference time - OK √ preloaded certificates are valid - OK √ keyd is running - OK √ certd is running - OK √ tpmd is running - OK √ identityd is running - OK × read all preloaded certificates from the Certificates Service - Error could not load cert with ID "aziot-edged-trust-bundle" Caused by: parameter "id" has an invalid value caused by: not found √ read all preloaded key pairs from the Keys Service - OK √ check all EST server URLs utilize HTTPS - OK √ ensure all preloaded certificates match preloaded private keys with the same ID - OK Connectivity checks (aziot-identity-service) -------------------------------------------- ‼ host can connect to and perform TLS handshake with iothub AMQP port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub HTTPS / WebSockets port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. ‼ host can connect to and perform TLS handshake with iothub MQTT port - Warning Could not retrieve iothub_hostname from provisioning file. Please specify the backing IoT Hub name using --iothub-hostname switch if you have that information. Since no hostname is provided, all hub connectivity tests will be skipped. √ host can connect to and perform TLS handshake with DPS endpoint - OK Configuration checks -------------------- √ aziot-edged configuration is well-formed - OK √ configuration up-to-date with config.toml - OK √ container engine is installed and functional - OK × configuration has correct URIs for daemon mgmt endpoint - Error SocketError - SocketErrorCode (TimedOut) : Operation timed out One or more errors occurred. (Got bad response: ) √ aziot-edge package is up-to-date - OK √ container time is close to host time - OK ‼ DNS server - Warning Container engine is not configured with DNS server setting, which may impact connectivity to IoT Hub. Please see https://aka.ms/iotedge-prod-checklist-dns for best practices. You can ignore this warning if you are setting DNS server per module in the Edge deployment. ‼ production readiness: logs policy - Warning Container engine is not configured to rotate module logs which may cause it run out of disk space. Please see https://aka.ms/iotedge-prod-checklist-logs for best practices. You can ignore this warning if you are setting log policy per module in the Edge deployment. × production readiness: Edge Agent's storage directory is persisted on the host filesystem - Error Could not check current state of edgeAgent container × production readiness: Edge Hub's storage directory is persisted on the host filesystem - Error Could not check current state of edgeHub container √ proxy settings are consistent in aziot-edged, aziot-identityd, moby daemon and config.toml - OK Connectivity checks ------------------- 23 check(s) succeeded. 5 check(s) raised warnings. Re-run with --verbose for more details. 4 check(s) raised errors. Re-run with --verbose for more details. 7 check(s) were skipped due to errors from other checks. Re-run with --verbose for more details. ```

Device Information

Runtime Versions

Server: Engine: Version: 20.10.17+azure-3 API version: 1.41 (minimum version 1.12) Go version: go1.17.11 Git commit: a89b84221c8560e7a3dee2a653353429e7628424 Built: Mon Jun 6 22:32:38 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.5.13+azure-1 GitCommit: a17ec496a95e55601607ca50828147e8ccaeebf1 runc: Version: 1.0.3 GitCommit: f46b6ba2c9314cfc8caae24a32ec5fe9ef1059fe docker-init: Version: 0.19.0 GitCommit:


## Logs

<details>
<summary>aziot-edged logs</summary>

Sep 05 15:45:52 edgelinux aziot-identityd[4550]: 2022-09-05T15:45:52Z [INFO] - Starting service... Sep 05 15:45:52 edgelinux aziot-identityd[4550]: 2022-09-05T15:45:52Z [INFO] - Version - 1.4.0 Sep 05 15:45:52 edgelinux aziot-identityd[4550]: 2022-09-05T15:45:52Z [INFO] - Provisioning starting. Reason: Startup Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [INFO] - Starting service... Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [INFO] - Version - 1.4.0 Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: ERROR:tcti:src/tss2-tcti/tcti-device.c:439:Tss2_Tcti_Device_Init() Failed to open device file /dev/tpm0: Permission denied Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: WARNING:tcti:src/tss2-tcti/tctildr.c:62:tcti_from_init() TCTI init for function 0x7fdeeded3fb0 failed with a000a Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: WARNING:tcti:src/tss2-tcti/tctildr.c:92:tcti_from_info() Could not initialize TCTI named: tcti-device Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: ERROR:tcti:src/tss2-tcti/tctildr-dl.c:150:tcti_from_file() Could not initialize TCTI file: device Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: ERROR:tcti:src/tss2-tcti/tctildr.c:418:Tss2_TctiLdr_Initialize_Ex() Failed to instantiate TCTI Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [ERR!] - service encountered an error Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [ERR!] - caused by: internal error Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [ERR!] - caused by: could not initialize TPM Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [ERR!] - caused by: tcti:IO failure Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 2022-09-05T15:45:52Z [ERR!] - 0: Sep 05 15:45:52 edgelinux aziot-tpmd[4557]: 1: Sep 05 15:45:52 edgelinux systemd[1]: aziot-tpmd.service: Main process exited, code=exited, status=1/FAILURE Sep 05 15:45:52 edgelinux systemd[1]: aziot-tpmd.service: Failed with result 'exit-code'.

</details>

## Additional Information
The issue will happen on clean Ubuntu installation.
If you try the next the TPM provisioning and iotedge 1.4 will work together.
1. On a clean ubuntu server 20.04LTS installation run the first script forcing installation of version 1.3 with TPM provisioning
2. uninstall iotedge following next steps (more or less taken by documentation) 

sudo apt-get autoremove --purge aziot-edge

get all container names, stop them all and delete them all

sudo docker stop $(sudo docker ps -a -q) sudo docker rm $(sudo docker ps -a -q)

get all docker images ids and delete them all

sudo docker rmi $(sudo docker images -a -q)

delete volumes not connected to containers

sudo docker volume ls -f dangling=true sudo docker volume prune

purge docker

sudo apt-get autoremove --purge moby-engine


4. launch the script modified in order to install version 1.4
Ok the magic is there... The TPM provisioning is working and your containers start to be downloaded and installed... 

* If on the same clean machine I'll install iotedge 1.4 with symmetric key provisioning it works
* If I try to install TPM provisioning after uninstallation of iotedge 1.4 with Symmetric key provisioning the issue is still there  
jlian commented 1 year ago

@onalante-msft is this fixed by your PR https://github.com/Azure/iot-identity-service/pull/451?

jlian commented 1 year ago

@SandroVaroli how urgent is this for you?

SandroVaroli commented 1 year ago

@jlian It's quite urgent since the script I shared is taken (a subset of...) from the one stored in the Linux golden image we used to install 40 devices that now are around the world ready to be installed in industrial plants. (a sort of "near to" plug and play provisioning that is possible just with TPM provisioning that does not need to share key) I cannot recall all of them in order to change the script forcing version 1.3...

I made tests on a spare device I have in my office... We could wait one week or so asking people to delay the switching on of new devices, but I have couples of urgent installation for some customer that cannot be delayed more than this.

jlian commented 1 year ago

Ok, I'll provide an update this afternoon. In the meantime

  1. @vipeller and @onalante-msft would be great if you could confirm that https://github.com/Azure/iot-identity-service/pull/451 would indeed fix this issue that @SandroVaroli is seeing.
  2. If you could spare the moment @SandroVaroli maybe see if using the package from the main CI/CD here resolves the issue for you? As of today I think the latest one that works would be packages_ubuntu-20.04_amd64. Download and extract then sudo apt install aziot-identity-service_1.4.0~dev-1_amd64.deb
onalante-msft commented 1 year ago

To add: it may also be worth trying device:/dev/tpmrm0 since the libtss version provided on Ubuntu 20.04 defaults to /dev/tpm0, which has more usage caveats.

jlian commented 1 year ago

@onalante-msft so you think it's a different issue?

onalante-msft commented 1 year ago

It may be, but it is still worth using the updated iot-identity-service packages to remove the other TPM issue as a possible point of failure.

SandroVaroli commented 1 year ago

@jlian let me summarize what your suggestion is:

  1. I should download the aziot-identity-service_1.4.0~dev-1_amd64.deb package
  2. send the package around the world to my automation installer that are at customer plants
  3. provide all this people with the root password of my device
  4. ask them to procede with normal attempt to provision the device with the script that will fail but will install moby and aziot-edge
  5. Remotely guide them in a step by step installation of the package (copy the package to the device... install it, etc...)
  6. reboot the devices
  7. ask this guys to please forget the root password

There is some step that is not fully convincing me...

How long we should wait for a working 1.4 version?

Some consideration If the 1.4 has blocking bug... shouldn't be released or downloaded as last stable... the sudo apt-get install aziot-edge should install version 1.3 until version 1.4 will be stable Bugs happen. I make sw, so I know.... but also hotfixes exist, and can be applied to releases for blocking bugs... You are out with a iotedge version that does not support TPM... add this should at least written on documentation...

jlian commented 1 year ago

@SandroVaroli sorry, I wasn't suggesting you distribute the dev build to your customers. The problem you're seeing isn't exactly the same as the one that @onalante-msft had recently fixed. So I was asking if you'd be able to help check that the latest .deb fixes your problem while we try dig in and try to repro your issue. Does that make more sense?

In the meantime you have a point re:

If the 1.4 has blocking bug... shouldn't be released or downloaded as last stable... the sudo apt-get install aziot-edge should install version 1.3 until version 1.4 will be stable Bugs happen. I make sw, so I know.... but also hotfixes exist, and can be applied to releases for blocking bugs... You are out with a iotedge version that does not support TPM... add this should at least written on documentation...

Adding @micahl and @damonbarry. Is this something we could do?

SandroVaroli commented 1 year ago

@jlian. Ok I'll try to install the update version in order to check if this fix the issue.

jlian commented 1 year ago

I realized that I never provided an update that I promised. The current plan is we will release a 1.4.1 as soon as we can, hopefully by middle of next week to align with the new .NET security patch that we also have to take.

Currently 1.4.1 has only one thing in scope and that is @onalante-msft's fix https://github.com/Azure/iot-identity-service/pull/451.

However like I said it's similar but not exactly the same issue you see. So:

  1. Please let us know if the updated .deb fixes it for you whenever possible
  2. @onalante-msft is trying to repro as well
  3. I'm going to check with @micahl and @damonbarry re: hotfix
micahl commented 1 year ago

I've edited the release notes to call out the fact that we're investigating issues related to TPM provisioning.

SandroVaroli commented 1 year ago

@jlian @onalante-msft I made several tests. the working one is:

  1. install the iotedge 1.4.0 sudo apt-get install aziot-edge
  2. install the upgrade sudo dpkg -i ./aziot-identity-service_1.4.0-dev-1_amd64.deb
  3. do not install pending dependency sudo apt-get install -f because it will override the identity service with version 1.4.0

It worked. That means:

So we can say the the issue is fixed with the update. If the version 1.4.1 will be released inside next week I can manage to delay the setup of the two devices of the plants currently in commissioning phase

Thanks for the support

jlian commented 1 year ago

Really appreciate your help and thanks for your patience!

CC: @damonbarry re: 1.4.1 TPM fix

jlian commented 1 year ago

1.4.1 is out https://github.com/Azure/azure-iotedge/releases/tag/1.4.1 @SandroVaroli

I'm closing the issue. If you see any problem we can reopen

tervoju commented 1 year ago

I think I am seeing this issue still with this. aziot-tpmd[22949]: WARNING:tcti:src/tss2-tcti/tcti-device.c:428:Tss2_Tcti_Device_Init() Failed to open default TCTI device file /dev/tpmrm0: Permission denied

micahl commented 1 year ago

@tervoju at first glance that looks like a potentially different thing. Please open a separate issue with details on your setup and steps taken to reproduce the problem to assist us in narrowing down the issue.

One quick thing to check is that you’ve given IoT Edge access to the TPM by setting the tpmaccess.rules similar to what we have in docs.

tervoju commented 1 year ago

ok, it looks indeed different thing now. got it little bit further but still fully working. will create another error if needed.