Closed mquillin closed 1 year ago
@mquillin ack... I can repro this as well.
@arsing any insight into what might be the issue here? I saw the same behavior. EdgeHub starts up and when it attempts to decrypt it hits this error in the keys service. Everything else appeared to work prior (provisioning via DPS using EST for cert issuance).
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - Starting module edgeHub...
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - Starting new listener for module edgeHub
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - Starting workload API...
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - --> 204 {}
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - <-- POST /modules/%24edgeAgent/genid/637907616673364985/encrypt?api-version=2022-08-03 {"accept": "application/json", "host": "workload.sock:80", "connection": "close", "content-type": "application/json", "content-length": "2442"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- POST /key?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "74"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- POST /encrypt?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "2774"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - <-- POST /modules/%24edgeHub/genid/637907616673364985/certificate/server?api-version=2020-07-07 {"accept": "application/json", "host": "workload.sock:80", "connection": "close", "content-type": "application/json", "content-length": "67"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- GET /keypair/aziot-edged-ca?api-version=2020-09-01 {"host": "keyd.sock"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-certd[81624]: 2022-11-30T07:01:11Z [INFO] - <-- POST /certificates?api-version=2020-09-01 {"content-type": "application/json", "host": "certd.sock", "content-length": "1361"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- POST /parameters/algorithm?api-version=2021-05-01 {"content-length": "252", "content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- POST /parameters/rsa-modulus?api-version=2021-05-01 {"content-length": "252", "content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- POST /parameters/rsa-exponent?api-version=2021-05-01 {"content-length": "252", "content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - <-- POST /encrypt?api-version=2021-05-01 {"content-length": "359", "content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:11Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-certd[81624]: 2022-11-30T07:01:11Z [INFO] - --> 201 {"content-type": "application/json"}
Nov 30 07:01:11 fooDeb aziot-edged[81616]: 2022-11-30T07:01:11Z [INFO] - --> 201 {"content-type": "application/json"}
Nov 30 07:01:12 fooDeb aziot-edged[81616]: 2022-11-30T07:01:12Z [INFO] - <-- GET /trust-bundle?api-version=2020-07-07 {"accept": "application/json", "host": "workload.sock:80", "connection": "close"}
Nov 30 07:01:12 fooDeb aziot-certd[81624]: 2022-11-30T07:01:12Z [INFO] - <-- GET /certificates/aziot-edged-trust-bundle?api-version=2020-09-01 {"host": "certd.sock"}
Nov 30 07:01:12 fooDeb aziot-certd[81624]: 2022-11-30T07:01:12Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:12 fooDeb aziot-edged[81616]: 2022-11-30T07:01:12Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:12 fooDeb aziot-edged[81616]: 2022-11-30T07:01:12Z [INFO] - <-- POST /modules/%24edgeHub/genid/637907616673364985/sign?api-version=2020-07-07 {"accept": "application/json", "host": "workload.sock:80", "connection": "close", "content-type": "application/json", "content-length": "169"}
Nov 30 07:01:12 fooDeb aziot-identityd[81620]: 2022-11-30T07:01:12Z [INFO] - <-- GET /identities/modules/$edgeHub?api-version=2020-09-01&type=aziot {"host": "identityd.sock"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /encrypt?api-version=2021-05-01 {"content-length": "355", "content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- GET /key/aziot_identityd_master_id?api-version=2021-05-01 {"host": "keyd.sock"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /derivedkey?api-version=2021-05-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "336"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /derivedkey/export?api-version=2021-05-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "724"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /derivedkey?api-version=2021-05-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "340"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /derivedkey/export?api-version=2021-05-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "736"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-identityd[81620]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /sign?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "898"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-edged[81616]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-edged[81616]: 2022-11-30T07:01:13Z [INFO] - <-- POST /modules/%24edgeHub/genid/637907616673364985/encrypt?api-version=2020-07-07 {"accept": "application/json", "host": "workload.sock:80", "connection": "close", "content-type": "application/json", "content-length": "438"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /key?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "74"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /encrypt?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "766"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-edged[81616]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-edged[81616]: 2022-11-30T07:01:13Z [INFO] - <-- POST /modules/%24edgeHub/genid/637907616673364985/decrypt?api-version=2020-07-07 {"accept": "application/json", "host": "workload.sock:80", "connection": "close", "content-type": "application/json", "content-length": "463"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /key?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "74"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 200 {"content-type": "application/json"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - <-- POST /decrypt?api-version=2020-09-01 {"content-type": "application/json", "host": "keyd.sock", "content-length": "791"}
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [ERR!] - C_Decrypt failed with CKR_BUFFER_TOO_SMALL
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [ERR!] - !!! internal error
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [ERR!] - !!! caused by: could not decrypt
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [ERR!] - !!! caused by: could not decrypt: AZIOT_KEYS_RC_ERR_EXTERNAL
Nov 30 07:01:13 fooDeb aziot-keyd[81633]: 2022-11-30T07:01:13Z [INFO] - --> 500 {"content-type": "application/json"}
EA and EH are the first (and usually only) callers of the symmkey encrypt/decrypt API. Also they only decrypt when they have something to decrypt, so I think the deployment itself will go fine and decrypt will only happen if they're restarted. So the fact that everything works before that point is to be expected.
As for why softhsm's C_Decrypt
returns CKR_BUFFER_TOO_SMALL
for something that was encrypted by the same library's C_Encrypt
, we'd need to debug it to be sure. I don't see any issues about it in their tracker.
Given we haven't seen this obvious breakage before, is it only broken on Debian 11's version of softhsm?
Lastly, while we obviously should investigate it, I will note that using softhsm doesn't give any benefit over just using filesystem keys, so if you need immediate relief you can remove the PKCS#11 config as you already tried.
Also, using DPS and using EST should not be relevant. Can you confirm the same thing happens even with simple manual-symmkey provisioning and quickstart CA?
@arsing, yes, I see the same error in that configuration as well. I was also on Debian 11 so dunno about others. Will give it a quick try on Ubuntu 20.04.
Given we haven't seen this obvious breakage before, is it only broken on Debian 11's version of softhsm?
Actually it's possible this has always been broken. I don't know if there's ever been a full Edge test with softhsm, and nothing in i-i-s itself would hit the symmkey encrypt/decrypt API.
The error is probably coming from https://github.com/opendnssec/SoftHSMv2/blame/2.6.1/src/lib/SoftHSM.cpp#L3221-L3225 . It asserts that the plaintext buffer is exactly as long as the ciphertext buffer. In our code we give it a plaintext buffer that's 16 bytes smaller than the ciphertext buffer because the ciphertext includes a 16-byte tag. https://github.com/Azure/iot-identity-service/blob/1.4.1/key/aziot-keys/src/key.rs#L510
The softhsm check is overly conservative, but we should cater to it by removing the - 16
anyway.
Ah, that makes sense.
Thanks for looking into this! I have switched to using the filesystem for now, I want to use softhsm to verify the interface so I know we have a working system before testing various HSM's.
Can you test with the Debian 11 aarch64 artifact from https://github.com/Azure/iot-identity-service/actions/runs/3587571056 ?
@mquillin fwiw I see it working now with @arsing's draft change. Here's what I did to easily replace the bits on my VM:
Version
in the package's control file from 1.4.0~dev-1
to 1.4.1-1
using this handy scriptsudo apt remove aziot-edge aziot-identity-service
sudo apt install ./aziot-identity-service_1.4.0~dev-1_amd64.modfied.deb; sudo apt install aziot-edge
sudo iotedge config apply
@arsing do you have an ETA when this fix will be released? Also is this changed rolled into the docker container (i.e. mcr.microsoft.com/azureiotedge-agent:1.4.3) or will it require updating with apt?
I saw a new version 1.4.4 and wasn't sure if that included this fix; or in general this service.
https://github.com/Azure/iot-identity-service/pull/490 is the PR for 1.4. It's waiting for backport approval internally.
@mquillin that internal discussion happens tomorrow afternoon so we should be able to provide an update after that.
@micahl has this been released yet?
I am using https://packages.microsoft.com/config/debian/11/packages-microsoft-prod.deb
and still see some issues on boot
Feb 09 16:35:18 MLQ aziot-keyd[969]: 2023-02-10T00:35:18Z [ERR!] - invalid parameter "id": not found
@mquilling sorry, yes. The fix went in mid-December and was picked up as part of the 1.4.8 patch for IoT Edge although looks like we forgot to mention it in the release notes. The identity service was bumped to 1.4.2. If you're installing the latest aziot-edge
and aziot-identity-service
from packages.microsoft.com then you should have it.
I believe that error about invalid parameter "id": not found
is unrelated (and probably safe to ignore). Do you see it repeatedly occurring or is it more of a one-time thing?
@micahl thanks for the confirmation!
its working now, I had switched to 32 bit for softhsm, and all my containers broke.
any chance you know if there is a softhsm arm64v8 package?
the standard install is a 32 bit version
Should be: https://packages.debian.org/bullseye/softhsm2
What you're getting / seeing is probably based on how apt-get is configured.
that is what I am installing, softhsm2-util works and everything
Feb 09 16:25:46 MLQ aziot-keyd[12758]: 2023-02-10T00:25:46Z [ERR!] - could not load library: /usr/lib/softhsm/libsofthsm2.so: wrong ELF class: ELFCLASS64
does this mean the key service is compiled as 32 bit and looking for 32 bit but found 64?
Which RPi OS are you on, 32- or 64-bit?
64 bit, works on 32 bit fine - but really wanted to use 64
We provide 64-bit packages with each release. You can find them on the github release at https://github.com/Azure/azure-iotedge/releases/tag/1.4.8
great! I will try installing that instead of the prod link.
for reference I know I can update the edgeAgent and edgeHub containers via the cloud. Does the identity service require an apt update? In general what’s the recommended way to update all the iot edge components?
Yes, the identity service is updated via apt. Refer to our doc page on Update IoT Edge.
In the context of updating if you're looking at upgrading between patch releases (e.g. 1.4.1 -> 1.4.2) then it shouldn't matter whether you first update the system containers and then .deb packages or vice-versa since we keep the interface between the two stable such that a newer version of one should work with an older version of the other.
If upgrading between product releases (e.g. 1.1 -> 1.4) then that flexibility still exists, but we'd recommend attempting to update both in tandem since a specific combination of host components + system containers is what we test. An approach to doing that would be to stop the system, uninstall the deb packages, upgrade Docker if not already on the latest, install the aziot-edge package, update your module deployment manifest in the cloud, then restart the runtime.
I am running into what looks like a bug when I spin up the edge runtime services. At first everything seems to work fine and the device is associated to the IoT Hub and generates all the keys and certs - but as soon as I add it to a deployment it breaks. Several containers are successfully deployed before hitting this error. I have a total of 7 containers in addition to edgeAgent and edgeHub.
I am doing a pretty basic install on a raspberry pi with 64 bit os lite
config.toml
edgeAgent error logs for reference
This is very easily reproduced on my end
Also note that everything works as expected when these lines are removed from the config