OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.26k stars 577 forks source link

[BUG][STIR_SHAKEN] stir_shaken_verify random issue load_cert/PEM_X509_INFO_read_bio #3446

Open Integration-IT opened 1 month ago

Integration-IT commented 1 month ago

Version:

version: opensips 3.4.6 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 0f0d08dba

Issue: In previous tests, stir_shaken_verify randomly did returned an -1 ret code with an internal error during the checks. the PR aspire to return more explicite information to hook a path of investigation.

After inspection we have new informations with an explicit error reason.

ERROR:stir_shaken:load_cert: error reading certificate stack
ERROR:stir_shaken:w_stir_verify: Failed to load certificate
ERROR:SCRIPT:DBG: [REQUEST ROUTE][CHECK-STIR-SHAKEN] | -1: Internal error | err_code: 500 | err_reason: Failed to load certificate.

Starting point :

Entry point error:

Other info:

Integration-IT commented 1 month ago

update.

focus on load_cert function, the 'error reading certificate stack' point here:

        sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
        if (!sk) {
            LM_ERR("error reading certificate stack\n");
            X509_free(*cert);
            *cert = NULL;
            BIO_free(cbio);
            sk_X509_free(stack);
            return -1;
        }

To change the sk init, only cbio is relevant. Next step is to very cbio value before call PEM_X509_INFO_read_bio.

void log_cert(const str *cert_buf)
{
    static str cbuf;
    char *p, *end, *w;
    unsigned h;

    if (!cert_buf) {
        LM_ERR("NULL cert_buf pointer!!\n");
        return;
    }

    if (ZSTR(*cert_buf)) {
        LM_ERR("zero-string in cert_buf!! (%p %d)\n", cert_buf->s, cert_buf->len);
        return;
    }

    if (pkg_str_extend(&cbuf, 2 * cert_buf->len + 1) != 0) {
        LM_ERR("oom PKG\n");
        return;
    }

    w = cbuf.s;
    for (p = cert_buf->s, end = p + cert_buf->len; p < end; p++) {
        sprintf(w, "%02x", *(unsigned char *)p);
        w += 2;
    }
    *w = '\0';

    h = core_hash(cert_buf, NULL, 0);
    //LM_ERR("cert_buf: %u (%s)\n", h, cbuf.s);
    LM_ERR("cert_buf hash: %u\n", h);
}
        log_cert(cert_buf);
        sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
        if (!sk) {
            LM_ERR("error reading certificate stack\n");
            X509_free(*cert);
            *cert = NULL;
            BIO_free(cbio);
            sk_X509_free(stack);
            return -1;
        }

Unfortunately, the "error reading certificate stack" continue where cbio hashes have been identified similar. The hexadecimal encoded binary cert_buf displayed and returned by the log_cert function was the same for stir_verify success and stir_verify error. At this point cbio has same integrity with or without certificat stack issue.

Integration-IT commented 1 month ago

update.

To avoid race condition around sk and PEM_X509_INFO_read_bio. The idea is to clean the openssl error structure and force sk to NULL value before the init. Then after PEM_X509_INFO_read_bio try to check if we can find any error in the stack.

Error handler:

static int openssl_get_errstack(void)
{ 
    int code;
    code = ERR_get_error();
    return(code);
}

Load cert:

        sk = NULL;
        ERR_clear_error();
        sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);

        // any error after PEM_X509_INFO_read_bio ?
        err_check = openssl_get_errstack();
        if ( err_check != 0 ) {
            LM_ERR("PEM_X509_INFO_read_bio error detected: %d\n", err_check);
            LM_ERR("PEM_X509_INFO_read_bio error inspection: %s\n", ERR_error_string(err_check, 0));
        }

        if (!sk) {
            LM_ERR("error reading certificate stack\n");
            X509_free(*cert);
            *cert = NULL;
            BIO_free(cbio);
            sk_X509_free(stack);
            return -1;
        }

After a while the problem persists randomly, but in some cases it is possible to deal with an error code. This error completely troubles me because we previously checked the integrity of cbio. Next test is to check the cbio before and after the PEM_X509_INFO_read_bio !!!

Core log:

ERROR:stir_shaken:load_cert: PEM_X509_INFO_read_bio error detected: 151584876.
ERROR:stir_shaken:load_cert: PEM_X509_INFO_read_bio error inspection: error:0909006C:PEM routines:get_name:no start line. 
Integration-IT commented 1 month ago

update.

Resume:

WIP - Let a chance to init sk with several attempts before return an error.

        looper = 0;
        sk = NULL;
        while (looper < 5 && !sk) {
            looper++;
            sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
            if (!sk) {
                // trying to solve the random init in a non-clever way until ssl lib inspection.
                LM_ERR("bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again\n");
            }
        }

        if (!sk) {
            LM_ERR("error reading certificate stack\n");
            X509_free(*cert);
            *cert = NULL;
            BIO_free(cbio);
            sk_X509_free(stack);
            return -1;
        }
Integration-IT commented 1 month ago

WIP - partial FIX - no more internal error "Failed to load certificate".

10:07:36 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
10:11:58 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
10:14:03 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
10:23:55 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again

vector:

github-actions[bot] commented 2 weeks ago

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.