aws / aws-iot-device-sdk-java-v2

Next generation AWS IoT Client SDK for Java using the AWS Common Runtime
Apache License 2.0
112 stars 75 forks source link

PKCS11 Issues with Android #293

Closed oshoemaker closed 5 months ago

oshoemaker commented 2 years ago

Describe the bug

Greetings,

We are attempting to leverage an integrated secure element from Microchip (ATECC608A) running on android. The use of this device would appear to be supported using the aws-iot-device-sdk-java-v2 and a sample pubSub connect functionality is published with the SDK here:

https://github.com/aws/aws-iot-device-sdk-java-v2/blob/main/samples/Pkcs11Connect/src/main/java/pkcs11connect/Pkcs11Connect.java

There were some general issues running this on android. One was to make sure to use the aws-crt library specifically for android. There were some other issues around the arch label reported by our kernel (\"armv8l\") but those have been resolved.

I am unsure if at this point using the android version of the CRT lib is causing issues or if it is general compatibility issues taking a linux based approach on Android. The primary issue at this point is around threads and locking.

It appears that the CRT library attempts to create and pass in a mutex handler. This seems to silently fail on android and then reverts to using OS Locking as a backup. This functionality fails on android running inside of a Java APK.

Attempting to call .newMtlsPkcs11Builder(pkcs11Options) results in:

Exception encountered: software.amazon.awssdk.crt.CrtRuntimeException: TlsContext.tls_ctx_new: Failed to create new aws_tls_ctx (aws_last_error: AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086), A PKCS#11 (Cryptoki) library function failed with return value CKR_CANT_LOCK) AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086)

I have a JNI wrapper around the same PKCS11 lib and pass it a null pointer (No OS Locking and no mutex handler) and can successfully make calls with the device. Something similar to this:

Module module = Module.getInstance(\"/vendor/lib/libcryptoauth.so\", \"/vendor/lib/libcryptoauth.so\"); module.initialize(null); PKCS11 pkcs11 = module.getPKCS11Module();

CK_INFO info = pkcs11.C_GetInfo(); CK_SLOT_INFO slotInfo = pkcs11.C_GetSlotInfo(slotID); CK_TOKEN_INFO tokenInfo = pkcs11.C_GetTokenInfo(slotID);

This sets the mutex handler to null and osLocking to false.

I have a comment in the discussions section on GitHub here:

https://github.com/aws/aws-iot-device-sdk-java-v2/discussions/251

Supporting documentation:

PubSub with ATECC608: https://kickstartembedded.com/2022/04/24/raspberry-pi-atecc608-part-3-using-pkcs11-token-for-mqtt-pub-sub-with-aws-iot-core/?amp=1

PKCS11 Provider: https://docs.aws.amazon.com/greengrass/v2/developerguide/pkcs11-provider-component.html ARN(s): Does no apply

Expected Behavior

The expected behavior is that the PKCS11 implementation can be used on Android with the ATECC608A. This feature should behave the same on Android as it does on Linux.

Current Behavior

There are errors relating to locking/threading on the Android OS. This causes errors when attempting to initialize and use the PKCS11 library in the loT device SDK.

Attempting to call .newMtlsPkcs11Builder(pkcs11Options) results in:

Exception encountered: software.amazon.awssdk.crt.CrtRuntimeException: TlsContext.tls_ctx_new: Failed to create new aws_tls_ctx (aws_last_error: AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086), A PKCS#11 (Cryptoki) library function failed with return value CKR_CANT_LOCK) AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086)

Reproduction Steps

Current reproduction steps require Android 9.0, ATECC608A,Microchip's cryptoauthlib, the published android AWS CRT library, and the IoT device SDK. We compile the cryptoauthlib for android 9 (pkcs11 library) and supply this to the IoT sdk. At this point we attempt to initialize the mqtt connection and receive a locking error.

Exception encountered: software.amazon.awssdk.crt.CrtRuntimeException: TlsContext.tls_ctx_new: Failed to create new aws_tls_ctx (aws_last_error: AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086), A PKCS#11 (Cryptoki) library function failed with return value CKR_CANT_LOCK) AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086)

https://github.com/MicrochipTech/cryptoauthlib

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.9.3

Environment details (OS name and version, etc.)

Android 9.0

graebm commented 2 years ago

Thanks for great writeup!

You're right that the CRT lib passes a mutex handler, but the CRT is also fine if the PKCS#11 library chooses to use OS locks instead. That code is here. The CRT lib DOES currently require the PKCS#11 lib to do some kind of locking, since the CRT lib may call into the PKCS#11 lib from multiple threads. A PKCS#11 library that doesn't support locking AT ALL would take some effort to support.

But looking at the CryptoAuthLib source code on Github, it DOES seems to support locking. That code is here. It will use the mutex handler if it's provided. And it will use its own locking functions if the CKF_OS_LOCKING_OK flag is passed but no mutex handler is provided.

That leaves us with a few possibilities: 1) Maybe I'm not looking at the same CryptoAuthLib source code as what you're using. 2) Maybe something about the way CryptoAuthLib is compiled for android messes up how it uses locks?

Another thing you can do, to get visibility into this, is try to figure out how to turn on debug logging for CryptoAuthLib. I see stuff like this in their source code:

        if (lib_ctx->create_mutex(&lib_ctx->mutex))
        {
            PKCS11_DEBUG("Create Failed\r\n");
            return CKR_CANT_LOCK;

It would really shine some light if we saw something like "Create Failed" appear in the logs.

oshoemaker commented 2 years ago

I am still attempting to recompile the crypto auth lib with debug output. In the meantime this is the output with the CRT debug enabled:

D/pkcs11: [ee00e494] - id=0xe28bc2a0: Selected PKCS#11 token. slot:0 label:'00ABC' manufacturerID:'Microchip Technology Inc' model:'ATECC608A' serialNumber:'237B26F14AF86801' flags:0x00000401 sessionCount:0/1 rwSessionCount:0/1 freePublicMemory:4294967295/4294967295 freePrivateMemory:4294967295/4294967295 hardwareVersion:0.3 firmwareVersion:255.255 D/pkcs11: [ee00e494] - id=0xe28bc2a0 session=3929669504: Session opened on slot 0 E/pkcs11: [ee00e494] - id=0xe28bc2a0 session=3929669504: C_Login() failed. PKCS#11 error: CKR_CANT_LOCK (0x0000000A). AWS error: AWS_ERROR_PKCS11_CKR_CANT_LOCK D/pkcs11: [ee00e494] - id=0xe28bc2a0 session=3929669504: Session closed I/System.out: Exception encountered: software.amazon.awssdk.crt.CrtRuntimeException: TlsContext.tls_ctx_new: Failed to create new aws_tls_ctx (aws_last_error: AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086), A PKCS#11 (Cryptoki) library function failed with return value CKR_CANT_LOCK) AWS_ERROR_PKCS11_CKR_CANT_LOCK(1086) D/pkcs11: [ee00e494] - id=0xe28bc2a0: Unloading PKCS#11. C_Finalize:omit D/channel-bootstrap: [ee00e494] - id=0xea311ea0: releasing bootstrap reference D/channel-bootstrap: [ee00e494] - id=0xea311ea0: destroying I/event-loop: [ce990970] - id=0xea311e40: Destroying event_loop I/event-loop: [ce990970] - id=0xea311e40: Stopping event-loop thread. D/task-scheduler: [cea8e970] - id=0xea31beb8: Scheduling epoll_event_loop_stop task for immediate execution D/task-scheduler: [cea8e970] - id=0xea31beb8: Running epoll_event_loop_stop task with status D/event-loop: [cea8e970] - id=0xea311e40: exiting main loop D/task-scheduler: [cea8e970] - id=0xea3f7bd0: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution D/task-scheduler: [ce990970] - id=0xea3f7bd0: Running epoll_event_loop_unsubscribe_cleanup task with status I/event-loop: [ce990970] - id=0xea311de0: Destroying event_loop I/event-loop: [ce990970] - id=0xea311de0: Stopping event-loop thread. D/task-scheduler: [ceb8c970] - id=0xea31bdd8: Scheduling epoll_event_loop_stop task for immediate execution D/task-scheduler: [ceb8c970] - id=0xea31bdd8: Running epoll_event_loop_stop task with status D/event-loop: [ceb8c970] - id=0xea311de0: exiting main loop D/task-scheduler: [ceb8c970] - id=0xe42a87d0: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution D/task-scheduler: [ce990970] - id=0xe42a87d0: Running epoll_event_loop_unsubscribe_cleanup task with status I/event-loop: [ce990970] - id=0xea311d20: Destroying event_loop I/event-loop: [ce990970] - id=0xea311d20: Stopping event-loop thread. D/task-scheduler: [cec8a970] - id=0xea31bcf8: Scheduling epoll_event_loop_stop task for immediate execution D/task-scheduler: [cec8a970] - id=0xea31bcf8: Running epoll_event_loop_stop task with status D/event-loop: [cec8a970] - id=0xea311d20: exiting main loop D/task-scheduler: [cec8a970] - id=0xea3f7ad0: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution D/task-scheduler: [ce990970] - id=0xea3f7ad0: Running epoll_event_loop_unsubscribe_cleanup task with status I/event-loop: [ce990970] - id=0xea311cf0: Destroying event_loop I/event-loop: [ce990970] - id=0xea311cf0: Stopping event-loop thread. D/task-scheduler: [ced88970] - id=0xe42a0178: Scheduling epoll_event_loop_stop task for immediate execution D/task-scheduler: [ced88970] - id=0xe42a0178: Running epoll_event_loop_stop task with status D/event-loop: [ced88970] - id=0xea311cf0: exiting main loop D/task-scheduler: [ced88970] - id=0xe42a8010: Scheduling epoll_event_loop_unsubscribe_cleanup task for immediate execution D/task-scheduler: [ce990970] - id=0xe42a8010: Running epoll_event_loop_unsubscribe_cleanup task with status D/event-loop: [ce990970] - Event Loop Shutdown Complete I/System.out: Complete!

oshoemaker commented 2 years ago

@graebm log output above. I am still having issues with the debug logging in the pkcs11 library. I will updated when available.

graebm commented 2 years ago

based on the log output you shared, it doesn't look like CryptoAuthLib is calling the CRT lib's mutex handler at all. If the failure were happening in the CRT mutex handler we'd see something like: "...PKCS#11 CreateMutex() failed..." (source code) or "...PKCS#11 LockMutex() failed..." (source code)

graebm commented 1 year ago

any updates?

oshoemaker commented 1 year ago

At this point in time I think that we managed to work around the initial hurdles. Unfortunately we have not been able to establish an MQTT connection using the PKCS11 library. Our current error seems to be with the C_Sign() operation with the PKCS11 library.

E/pkcs11: [cc891970] - id=0xe0aad460 session=3900309056: C_Sign() failed. PKCS#11 error: CKR_FUNCTION_FAILED (0x00000006). AWS error: AWS_ERROR_PKCS11_CKR_FUNCTION_FAILED E/tls-handler: [cc891970] - id=0xe2262600: TLS key operation complete with error AWS_ERROR_PKCS11_CKR_FUNCTION_FAILED

With all of the debugging messages enabled I have not been able to identify the root cause of the failure. I am unclear if this is a PKCS11 library issue or the AWS implementation.

sbSteveK commented 8 months ago

Hi, I'm currently working through testing PKCS11 on Android. Have you had any progress on your efforts or gotten any additional logging on the nature of the function failure?