Open Integration-IT opened 1 month ago
update.
focus on load_cert function, the 'error reading certificate stack' point here:
sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
if (!sk) {
LM_ERR("error reading certificate stack\n");
X509_free(*cert);
*cert = NULL;
BIO_free(cbio);
sk_X509_free(stack);
return -1;
}
To change the sk init, only cbio is relevant. Next step is to very cbio value before call PEM_X509_INFO_read_bio.
void log_cert(const str *cert_buf)
{
static str cbuf;
char *p, *end, *w;
unsigned h;
if (!cert_buf) {
LM_ERR("NULL cert_buf pointer!!\n");
return;
}
if (ZSTR(*cert_buf)) {
LM_ERR("zero-string in cert_buf!! (%p %d)\n", cert_buf->s, cert_buf->len);
return;
}
if (pkg_str_extend(&cbuf, 2 * cert_buf->len + 1) != 0) {
LM_ERR("oom PKG\n");
return;
}
w = cbuf.s;
for (p = cert_buf->s, end = p + cert_buf->len; p < end; p++) {
sprintf(w, "%02x", *(unsigned char *)p);
w += 2;
}
*w = '\0';
h = core_hash(cert_buf, NULL, 0);
//LM_ERR("cert_buf: %u (%s)\n", h, cbuf.s);
LM_ERR("cert_buf hash: %u\n", h);
}
log_cert(cert_buf);
sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
if (!sk) {
LM_ERR("error reading certificate stack\n");
X509_free(*cert);
*cert = NULL;
BIO_free(cbio);
sk_X509_free(stack);
return -1;
}
Unfortunately, the "error reading certificate stack" continue where cbio hashes have been identified similar. The hexadecimal encoded binary cert_buf displayed and returned by the log_cert function was the same for stir_verify success and stir_verify error. At this point cbio has same integrity with or without certificat stack issue.
update.
To avoid race condition around sk and PEM_X509_INFO_read_bio. The idea is to clean the openssl error structure and force sk to NULL value before the init. Then after PEM_X509_INFO_read_bio try to check if we can find any error in the stack.
Error handler:
static int openssl_get_errstack(void)
{
int code;
code = ERR_get_error();
return(code);
}
Load cert:
sk = NULL;
ERR_clear_error();
sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
// any error after PEM_X509_INFO_read_bio ?
err_check = openssl_get_errstack();
if ( err_check != 0 ) {
LM_ERR("PEM_X509_INFO_read_bio error detected: %d\n", err_check);
LM_ERR("PEM_X509_INFO_read_bio error inspection: %s\n", ERR_error_string(err_check, 0));
}
if (!sk) {
LM_ERR("error reading certificate stack\n");
X509_free(*cert);
*cert = NULL;
BIO_free(cbio);
sk_X509_free(stack);
return -1;
}
After a while the problem persists randomly, but in some cases it is possible to deal with an error code. This error completely troubles me because we previously checked the integrity of cbio. Next test is to check the cbio before and after the PEM_X509_INFO_read_bio !!!
Core log:
ERROR:stir_shaken:load_cert: PEM_X509_INFO_read_bio error detected: 151584876.
ERROR:stir_shaken:load_cert: PEM_X509_INFO_read_bio error inspection: error:0909006C:PEM routines:get_name:no start line.
update.
Resume:
WIP - Let a chance to init sk with several attempts before return an error.
looper = 0;
sk = NULL;
while (looper < 5 && !sk) {
looper++;
sk = PEM_X509_INFO_read_bio(cbio, NULL, NULL, NULL);
if (!sk) {
// trying to solve the random init in a non-clever way until ssl lib inspection.
LM_ERR("bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again\n");
}
}
if (!sk) {
LM_ERR("error reading certificate stack\n");
X509_free(*cert);
*cert = NULL;
BIO_free(cbio);
sk_X509_free(stack);
return -1;
}
WIP - partial FIX - no more internal error "Failed to load certificate".
10:07:36 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
10:11:58 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
10:14:03 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
10:23:55 ERROR:stir_shaken:load_cert: bogus error reading certificate stack returned by PEM_X509_INFO_read_bio, try again
vector:
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.
Version:
Issue: In previous tests, stir_shaken_verify randomly did returned an -1 ret code with an internal error during the checks. the PR aspire to return more explicite information to hook a path of investigation.
After inspection we have new informations with an explicit error reason.
Starting point :
Entry point error:
Other info: