OpenSC / libp11

PKCS#11 wrapper library
GNU Lesser General Public License v2.1
298 stars 183 forks source link

Library cleanup issues with OpenSSL - libp11 - yubihsm_pkcs11.so chain #527

Open pugo opened 3 months ago

pugo commented 3 months ago

We use libp11 in a chain like the following:

CST - OpenSSL - libpkcs11.so - yubihsm_pkcs11.so - YubiHSM 2

Where CST is i.MX Code Signing Tool, which is used to sign images for secure boot on NXP SOC:s.

When stress testing our signing I saw that the PKCS11 sessions are not correctly released, which after a short while under load causes errors due to lack of free sessions. With the yubihsm-connector the maximum number of consecutive sessions is 16 and the default timeout for inactive sessions is 30 seconds. When running more frequent than the timeouts can cope with signing operations fail.

I have been investigating the not released sessions, since we have a setup where signing operations can come in bursts that could potentially be hurt by the described problem. This could cause random build failures.

The results from my investigations are some smaller issues in the CST tool, but also one or two in libp11. They are described below.

I will make a PR for the first one.

Reference counting in p11_key.c pkcs11_object_free()

In the following code in p11_key.c the decrease of obj->refcnt happens before the following handling of the obj->evp_key.

void pkcs11_object_free(PKCS11_OBJECT_private *obj)
{
        if(!obj)
                return;

        if (pkcs11_atomic_add(&obj->refcnt, -1, &obj->lock) != 0)
                return;
        if (obj->evp_key) {
                /* When the EVP object is reference count goes to zero, 
                 * it will call this function again. */
                EVP_PKEY *pkey = obj->evp_key;
                obj->evp_key = NULL;
                EVP_PKEY_free(pkey);
                return;
        }
        pkcs11_slot_unref(obj->slot);
        X509_free(obj->x509);
        OPENSSL_free(obj->label);
        pthread_mutex_destroy(&obj->lock);
        OPENSSL_free(obj);
}

In the if (obj->evp_key) section there is a comment that states that the pkcs11_object_free() will get a new call as a result of the EVP_PKEY_free(pkey) call. At the end of that scope it returns.

When the second call comes, as stated in the comment, the obj->refcnt will be decreased once more. This means that it will likely go negative and not 0 as tested in the if-clause. This means that it will return and the five clean-up lines at the end of the function will never be run.

I will make a PR for the above.

Workaround in eng_front.c blocks cleanup

In the function engine_destroy() in eng_front.c there is a turned off call to ctx_finish(), with a comment as below.

/* Destroy the context allocated with ctx_new() */
static int engine_destroy(ENGINE *engine)
{
    ENGINE_CTX *ctx;
    int rv = 1;

    ctx = get_ctx(engine);
    if (!ctx)
        return 0;

    /* ENGINE_remove() invokes our engine_destroy() function with
     * CRYPTO_LOCK_ENGINE / global_engine_lock acquired.
     * Any attempt to re-acquire the lock either by directly
     * invoking OpenSSL functions, or indirectly via PKCS#11 modules
     * that use OpenSSL engines, causes a deadlock. */
    /* Our workaround is to skip ctx_finish() entirely, as a memory
     * leak is better than a deadlock. */
#if 0
    rv &= ctx_finish(ctx);
#endif

    rv &= ctx_destroy(ctx);
    ENGINE_set_ex_data(engine, pkcs11_idx, NULL);
    ERR_unload_ENG_strings();
    return rv;
}

Without the turned off call to ctx_finish() I can't see that the cleanup is run at all. I understand the comment about the deadlock issue, but the result of the decision is not only a memory leak but a leak of PKCS#11 sessions and possibly other resources.

Would it be possible to handle this in a better way? To handle the locks in a more detailed way on calls to ENGINE_remove()?