Open sbrandt-aixtrusion opened 2 months ago
Hello @sbrandt-aixtrusion,
I was able to reproduce the issue using OP-TEE's pkcs11 TA token implementation. I use a client application that creates/destroys token key pairs using straight calls to Cryptoki (pkcs#11) API functions and that uses OpenSSL (with libp11 pkcs11 engine) to load the key and sign some data. Indeed, when I destroy an object and create a new one, my pkcs11 token (that is OP-TEE's pkcs11 TA) may assign the value of the previous key handle as handle value for the newly created key. This ends with libp11 pkcs11 engine to confuse object due to its cache when loading the private key reference.
I considered changing OP-TEE's pkcs11 TA implementation to lower cases where the pkcs11 object handle value of a destroyed key is re-assigned to a newly created object, but I think such change is fragile unless the implementation ensures an old object handle is never reused. I think such change would be to much of a constraint in OP-TEE's pkcs11 TA implementation considering a long term execution.
I tried another approach, changing libp11 implementation to reload object attributes when an object is initialized and its handle is found in the engine cache. I've created P-R https://github.com/OpenSC/libp11/pull/552 for that. Feedback is welcome.
@sbrandt-aixtrusion, have a look at @mtrojnar feedback in #552, I hope it may help.
Thanks a lot! The bug is gone in the test program, it will take some more time to verify in the real life implementation.
I only noticed later that some of the functions I used were written by a colleague, and not public, but their names were meaningful enough, it seems.
About unloading the engine - I tried that, we unload the engine when we don't use it. I will have to look into that in more detail. Might be that we don't call ENGINE_remove(), only ENGINE_free and ENGINE_cleanup(). But we do that after fetching the keys, before using them, so, ENGINE_remove might break our usage pattern.
So, at least for use, the "no caching" is a much better solution.
We have the following issue:
We use openssl 3.0.13 to OP-TEE (arm embedded) via the PKCS11 engine (libp11 0.4.12) We can create keys via pkcs11, use them from openssl, everything is okay. But we also need to create new keys, when we create a new certificate enrollment.
Now, it appears that the openssl pkey is "cached" inside the engine. If we create a new pkcs11 key, the public part of the pkey is still cached and thus, the old key, while the private part is passed through pkcs11 to the op-tee implementation and uses, of course, the new key. This makes the following workflow impossible:
Looking into the code of the pkcs11 engine, I found that the created EVP_PKEY gets an EVP_PKEY_up_ref, and is returned with a reference count of 2, not of 1. So, when the user program frees "all references" of that EVP_PKEY, it is not actually freed. This of course explains why the new key still has the old public information, instead of retrieving the new public information from the new pkcs11 key. This is quite obviously not just a bug (very old code), but "by design", though not visibly explained.
When calling another EVP_PKEY_free, the key reference is of course freed completely. But it is no longer possible to get an EVP_PKEY reference. This reason for this is unclear, I have not yet debugged that far.
So, there are two "broken" flows.
When we set up the project, Providers for PKCS11 where not yet sufficiently mature to be used. Switching everything to that would only make sense if we can be sure that this problem is not present there.
The issue cannot be easily reproduced in minimal sample code, as it requires a step outside of the application to create the new key.
The attached sample code should reproduce the problem, assuming you have a working pkcs11 setup.
Output is like:
As you can see, the public key as PEM that is printed is unchanged even though deleted and re-created. Also, the EVP_PKEY has the same pointer address.
Note that the problem does not appear if the keys are deleted in an external process using
pkcs11-tool
. Then, when getting the key after re-generation, the EVP_PKEY is "korrekt".If EVP_PKEY_free is called twice, the can't be loaded afterwards: