FreeRADIUS / freeradius-server

FreeRADIUS - A multi-protocol policy server.
http://freeradius.org
GNU General Public License v2.0
2.09k stars 1.07k forks source link

[defect]: 4.x startup fails with fr_openssl_init() called too late #4327

Closed huckabeec closed 2 years ago

huckabeec commented 2 years ago

What type of defect/bug is this?

Crash or memory corruption (segv, abort, etc...)

How can the issue be reproduced?

Built from git clone on RHEL8 host and installed on test RHEL8 server. No changes made to the configuration files at all, just trying to start the service in debug mode.

I thought it might be FIPS mode causing the issue as it is required on these systems, but disabling it made no difference.

Log output from the FreeRADIUS daemon

# /usr/sbin/radiusd -XXX
Fri Dec  3 21:14:32 2021: Error : tls - Failed to set OpenSSL memory allocation functions.  fr_openssl_init() called too late
# fips-mode-setup --check
FIPS mode is disabled.

Relevant log output from client utilities

No response

Backtrace from LLDB or GDB

No response

arr2036 commented 2 years ago

Going to need more information. I have rhel8 builds with openssl that work fine.

arr2036 commented 2 years ago

Going to need more information. I have rhel8 builds with openssl that work fine.

I.e is this a completely stock rhel8 install or are there customisations (is this a vm template?)

alandekok commented 2 years ago

What else was changed on the underlying VM? As Arran said, we have stock RHEL8 builds which are fine. Perhaps you can do this:

At some point it will stop working. That change to the base OS is what broke OpenSSL.

Unfortunately, OpenSSL doesn't produce good messages here. So there isn't a lot we can do to track this down.

huckabeec commented 3 weeks ago

I doubt I can re-open this case but I finally able to get back to testing the 4.x code and confirmed that the only change was enabling FIPS.

Also found where the mysqld folks hit this same issue: https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1971788

And RedHat has an article based on their work around for mysqld: https://access.redhat.com/solutions/6882881

Apparently when FIPS is enabled OpenSSL_init() will call RAND_init_fips() which does some memory allocation steps. That means the call to CRYPTO_set_mem_functions() in src/lib/tls/base.c will fail.

But it looks like this failure can be safely ignored - that's what the mysqld "fix" for the problem is, it changes the hard failure in the code to a warning when that flag is set. I'm wondering if a fix would be to add a check for FIPS mode in fr_openssl_init() in src/lib/tls/base.c - wrapping the "return -1" to only return as an error if FIPS is disabled, otherwise log a warning message and move on.

alandekok commented 3 weeks ago

@huckabeec the main thing here is that if the call to CRYPTO_set_mem_functions() fails, will it still save our callbacks?

huckabeec commented 3 weeks ago

I'm assuming so based on what the mysqld folks did in their code:

int ret = CRYPTO_set_mem_functions(my_openssl_malloc, my_openssl_realloc, my_openssl_free); if (ret == 0) LogErr(WARNING_LEVEL, ER_SSL_MEMORY_INSTRUMENTATION_INIT_FAILED, "CRYPTO_set_mem_functions");

They just treat it as a warning and move on. I thought they check for FIPS but looks like in the latest code it just always treats that as a warning and the flag mentioned earlier just hides that warning in the logs.

arr2036 commented 3 weeks ago
int CRYPTO_set_mem_functions(CRYPTO_malloc_fn malloc_fn,
                             CRYPTO_realloc_fn realloc_fn,
                             CRYPTO_free_fn free_fn)
{
    if (!allow_customize)
        return 0;
    if (malloc_fn != NULL)
        malloc_impl = malloc_fn;
    if (realloc_fn != NULL)
        realloc_impl = realloc_fn;
    if (free_fn != NULL)
        free_impl = free_fn;
    return 1;
}
arr2036 commented 3 weeks ago

We actually need our own memory allocation functions, so pretending they were set doesn't work.

arr2036 commented 3 weeks ago

https://github.com/openssl/openssl/blob/873f269697df848d13dc012a265759baa8eed8cd/crypto/mem.c#L193

yeah it's locked out after first allocation

arr2036 commented 3 weeks ago

Setting the memory allocation functions is literally the first thing we do in library initialisation, and openssl library initialisation is done very early. Must be some 3rd party library using elf init. @huckabeec if you could build against a version of OpenSSL with developer enabled, and set a breakpoint at that line in mem.c, you should be able to see what exactly, is initialising OpenSSL early, and then we can see if we can fix it.

huckabeec commented 3 weeks ago

Setting the memory allocation functions is literally the first thing we do in library initialisation, and openssl library initialisation is done very early. Must be some 3rd party library using elf init. @huckabeec if you could build against a version of OpenSSL with developer enabled, and set a breakpoint at that line in mem.c, you should be able to see what exactly, is initialising OpenSSL early, and then we can see if we can fix it.

From my reading it it the setup of FIPS mode. In RHEL7 there was a specific 'dracut-fips' package that gets installed so when FIPS mode is enabled, dracut builds an initrd with FIPS enabled and I think the init is happening there so during the boot process.

I say that because the references I found mentioned removing the dracut-fips package forcibly and rebuilding the initrd. When that was done, for the mysqld issue, it stopped the issue (this was in a prior version before they changed that error to a warning). In RHEL8 all of that FIPS logic was added to the main dracut package so the only way to work around it that I can find is disabling FIPS which rebuilds initrd without the early call.

I'm happy to try the openssl dev build though if you think it will help.

arr2036 commented 3 weeks ago

It'd still be helpful. We'd at least know for sure what was allocing the memory. FWIW I've run FreeRADIUS 4 in FIPs mode on RHEL9 and not seen this problem.

huckabeec commented 3 weeks ago

Adding note here for others who may read this:
To fully disable FIPS on RHEL8, you need to not only run: fips-mode-setup --disable but also do: rm /etc/system-fips that file is created by fips-mode-setup during the enable step.

The existence of that file apparently triggers the same code in OpenSSL which causes this issue and fips-mode-setup does not clean up after itself during the disable.

huckabeec commented 3 weeks ago

It'd still be helpful. We'd at least know for sure what was allocing the memory. FWIW I've run FreeRADIUS 4 in FIPs mode on RHEL9 and not seen this problem.

RHEL9 is on OpenSSL 3.x correct ? If so that's the difference - the FIPS code in 3.x is completely different from my reading (I am no expert by far). RHEL8 is still on the 1.1.1 branch.

huckabeec commented 3 weeks ago

Loaded up the debugging symbols for the system and it looks like this is triggered during the library load for libcrypto. Simply touching /etc/system-fips triggers the FIPS setup in OpenSSL as the library is loaded:

`(gdb) bt

0 FIPS_module_installed () at crypto/fips/fips.c:407

1 0x00007ffff67da0d5 in init_fips_mode () at crypto/o_init.c:39

2 OPENSSL_init_library () at crypto/o_init.c:70

3 0x00007ffff7dd611a in call_init (l=, argc=argc@entry=2, argv=argv@entry=0x7fffffffd938, env=env@entry=0x7fffffffd950)

at dl-init.c:72

4 0x00007ffff7dd621a in call_init (env=0x7fffffffd950, argv=0x7fffffffd938, argc=2, l=) at dl-init.c:30

5 _dl_init (main_map=0x7ffff7ffe230, argc=2, argv=0x7fffffffd938, env=0x7fffffffd950) at dl-init.c:119`

That FIPS_module_installed() check sees that file and it triggers a whole bunch of other FIPS related setup.

Longer bt that looks specifically for the CRYPTO_malloc:

(gdb) b CRYPTO_malloc
Function "CRYPTO_malloc" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (CRYPTO_malloc) pending.
(gdb) r
Starting program: /usr/sbin/radiusd 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, CRYPTO_malloc (num=56, file=0x7ffff69c07f3 "crypto/threads_pthread.c", line=29) at crypto/mem.c:193
193 {
Missing separate debuginfos, use: yum debuginfo-install libkqueue-2.6.1-1.x86_64
(gdb) bt
#0  CRYPTO_malloc (num=56, file=0x7ffff69c07f3 "crypto/threads_pthread.c", line=29) at crypto/mem.c:193
#1  0x00007ffff68e990d in CRYPTO_zalloc (num=num@entry=56, file=file@entry=0x7ffff69c07f3 "crypto/threads_pthread.c", line=line@entry=29)
    at crypto/mem.c:230
#2  0x00007ffff694f96b in CRYPTO_THREAD_lock_new () at crypto/threads_pthread.c:29
#3  0x00007ffff68d74bd in do_fips_lock_init () at crypto/fips/fips.c:92
#4  do_fips_lock_init_ossl_ () at crypto/fips/fips.c:90
#5  0x00007ffff5065e97 in __pthread_once_slow (once_control=0x7ffff6c463f4 <fips_lock_init>, init_routine=0x7ffff68d74b0 <do_fips_lock_init_ossl_>)
    at pthread_once.c:116
#6  0x00007ffff694fa5d in CRYPTO_THREAD_run_once (once=once@entry=0x7ffff6c463f4 <fips_lock_init>, 
    init=init@entry=0x7ffff68d74b0 <do_fips_lock_init_ossl_>) at crypto/threads_pthread.c:118
#7  0x00007ffff68d7c6f in FIPS_module_mode_set (onoff=onoff@entry=1) at crypto/fips/fips.c:421
#8  0x00007ffff68f2e8c in FIPS_mode_set (r=r@entry=1) at crypto/o_fips.c:28
#9  0x00007ffff67da070 in init_fips_mode () at crypto/o_init.c:44
#10 OPENSSL_init_library () at crypto/o_init.c:70
#11 0x00007ffff7dd611a in call_init (l=<optimized out>, argc=argc@entry=1, argv=argv@entry=0x7fffffffd958, env=env@entry=0x7fffffffd968)
    at dl-init.c:72
#12 0x00007ffff7dd621a in call_init (env=0x7fffffffd968, argv=0x7fffffffd958, argc=1, l=<optimized out>) at dl-init.c:30
#13 _dl_init (main_map=0x7ffff7ffe230, argc=1, argv=0x7fffffffd958, env=0x7fffffffd968) at dl-init.c:119
#14 0x00007ffff7debe1a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#15 0x0000000000000001 in ?? ()
#16 0x00007fffffffdc32 in ?? ()
#17 0x0000000000000000 in ?? ()
arr2036 commented 3 weeks ago

Yes RHEL9 is on OpenSSL 3.x... and yes that confirms that FIPS_mode_set is being done inside the initialisation function for a library. Probably libssl/libcrypto itself.

huckabeec commented 3 weeks ago

So looks like we'll have to move to RHEL9 for our 4.x testing, not sure how to fix an issue the library causes for itself on loading.

For my own enlightenment, the 3.x line works fine on RHEL8 in FIPS mode, so I assume the OpenSSL initialization in 4.x s just that different to have this issue ?

arr2036 commented 3 weeks ago

I don't believe v3.x uses custom memory allocators. In v4 it's needed so that we can over-allocate the C call stacks OpenSSL uses for "yielding" out of the certificate validation callbacks. It's a horrible hack, but fixes some crashes we were experiencing in EAP-TLS and friends.

I've now officially removed support for OpenSSL < 3.0.0 from the server, primarily due to this FIPS issue. This means the minimum OS versions for v4 are FreeBSD 14, Debian 12, Ubuntu 22.04, Rocky/RHEL 9.