Azure / azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
https://azure.github.io/azure-iot-sdk-c
Other
587 stars 739 forks source link

Multiple concurrent TPM commands cause device client to fail to connect. #1488

Closed rerkcp closed 4 years ago

rerkcp commented 4 years ago

Development Machine, OS, Compiler (and Other Relevant Toolchain Info)

Ubuntu 18.04 using gcc 7.5.0-3ubuntu1~18.04

SDK Version (Please Give Commit SHA if Manually Compiling) SHA 4867d928942d52d9a2cd7d0871a066c3148a7507 compiled with use_prov_client=ON and use_tpm_simulator=ON

Protocol

MQTT, HTTP

Describe the Bug

Client connects to IoTHub after obtaining correct iotHubUri and deviceId from provisioning service, both using TPM simulator as authorization. This works fine:

auto IotHubClientHandle = IoTHubDeviceClient_CreateFromDeviceAuth(iotHubUri, deviceId, MQTT_Protocol);
// send and receive events work

Now when I add this call a very short time after the CreateFromDeviceAuth: auto success = IoTHubDeviceClient_UploadToBlobAsync(IotHubClientHandle, "filename", &data[0], data.size(), sUploadCallback, this);

the application will very often (>50%) not succeed to connect to the IotHub at all. Errors like: Error: Time:Wed Apr 1 12:57:52 2020 File:/home/rerk/azure/DeviceConnector/external/azure-iot-sdk-c/provisioning_client/deps/utpm/src/tpm_codec.c Func:TSS_GetTpmProperty Line:679 Get Capability failure or other errors from tpm_codec.c appear continuously.

The reason for this appears to be that TPM commands are sent simultaneously from multiple threads in this scenario (see attached stacktrace screenshot), one from the IotHub connection itself, one from the blob upload: unsafe_call_of_TSS_GetTpmProperty_cropped

As a fix, a lock was added in utpm/inc/azure_utpm_c/tpm_codec.h which guards the TSS_DispatchCmd function in utpm/src/tpm_codec.c. With this fix, the connection happens reliably and both Iothub connection and file upload work as intended.

The fix is checked in to https://github.com/rerkcp/azure-utpm-c.

Console Logs

See above.

CIPop commented 4 years ago

Thank you @rerkcp for submitting and proposing the fix!

Due to our current design and layering, we need a different fix: The TPM driver as well as our LL (low-level) public API should not depend on threading and locks. The LL layer is not thread safe and is meant for bare-metal developing.

In this case, the high-level, thread-aware APIs are used. The correct fix is within the high-level APIs: locks should be added in all places where the HSM is being used (during Provisioning and Hub auth) to ensure only one TPM flow is being performed at any point in time.

Would you be willing to move the locks within IoTHubDeviceClient_ as well as add a few tests? If yes, I can help review & merge the proposal.

rerkcp commented 4 years ago

I would be willing to code this, but I am unsure how to actually do this without breaking functionality. Can I assume that iothub_device_auth_create() in iothub_auth_client.c is only called once and the IOTHUB_SECURITY_INFO created in it therefore a good place to place the lock?

CIPop commented 4 years ago

Thanks @rerkcp: had to look in more detail. I believe you're right: there's no simple way to fix this without adding a lock within the LL layer: the MQTT layer will need to re-generate the SAS token (from the TPM in this case) whenever it expires which you can't control (so it's not enough to do it in the create() function). Refresh/generation (which reaches out to the TPM) is done after the LL_DoWork is entered which means we can't have a lock. Looking at the stack, we can only add locks above the first _LL_ call.

The only place where I currently see the possibility of adding a lock is around the IoTHubClientCore_LL_DoWork and the IoTHubClientCore_LL_UploadToBlob which may both request TPM signing at some point. The lock should probably be Lock(iotHubClientInstance->LockHandle which is device-dependent.

rerkcp commented 4 years ago

Hi, thanks for the clarification and for the information about the SAS token refresh, which eluded my view up to now.

Looking at the code around your suggestion, I get the feeling that placing a lock around the IoTHubClientCore_LL_UploadToBlob will prevent quite a bit of possible parallelism (see the comment in uploadingThread() about multiple concurrent uploads) and also disable IotHub reconnection/token renewal while a (possibly long running) upload is taking place, all to secure the sign_sas_data() call in iothub_auth_client.c which is likely to be relatively short compared to the upload.

Also, it is unclear to me how the same would done in the async case (IoTHubClientCore_UploadToBlobAsync)?

So I propose a different fix, inside provisioning_client/adapters/hsm_client_tpm.c ... I checked the change into https://github.com/rerkcp/azure-iot-sdk-c.git on branch lock-outside-LL. In summary I overrode tpm_interface to point to functions that encase their intended functionality in Lock/Unlock. This change can be disabled by precompiler symbol TPM_NO_LOCKING.

How does that look? The adapter path seems to indicate to me that this would be a better place for this fix.

CIPop commented 4 years ago

@rerkcp Sorry for not getting back sooner. I'm actively investigating the issue and trying to find an alternate way (maybe without changing our implementation) by using a TPM2 ResourceManager such as https://github.com/tpm2-software/tpm2-abrmd.

Unfortunately, it's still failing (with a different error but in the same Sign function) even with the RM. _Correction: I can repro the exact TSS_BADPROPERTY "Get Capability failure". It depends on where the two threads are at the same time.

I'm considering your last solution, especially because we can set the options to be defined by default to preserve backward compatibility for those not using both IoT and BLOB. I'll need a bit more time to completely rule out the possibility of using an RM first that might make locking unnecessary.

CIPop commented 4 years ago

@rerkcp I have a solution that is working, without any locks. Please test the following:

  1. Install https://tpm2-software.github.io/ for your Linux distribution. On Ubuntu this is available by running:
    
    sudo apt install tpm2-abrmd libtss2-tcti-tabrmd-dev

service tpm2-abrmd status



_Note:_ on Raspbian I noticed that the installer doesn't properly set up a service. To try this out, temporarily use `sudo /usr/sbin/tpm2-abrmd --allow-root` (it will default to bind to /dev/tpm0 but can be configured to use multiple TPMs). 

2. Sync provisioning_client/deps/utpm to https://github.com/Azure/azure-utpm-c/tree/tpm_threads

Your application should now use the TPM TSS Stack which supports multi-threading and multi-process access of the TPMv2.0 hardware.
The utpm branch is fixing an issue with our implementation not being multi-thread aware.
az-iot-builder-01 commented 4 years ago

@CIPop, @rerkcp, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey