do not skip init() in realloc()

cgzones commented 1 year ago

If N_ARENA is greater than 1 thread_arena is initially to N_ARENA, which is an invalid index into ro.size_class_metadata[].

The actual used arena is computed in init().

Ensure init() is called if a new thread is only using realloc() to avoid UB, e.g. pthread_mutex_lock() might crash due the memory not holding an initialized mutex.

Affects mesa 23.2.0~rc4.

Example back trace using glmark2 (note arena=4 with the default N_ARENA being 4):

Program terminated with signal SIGSEGV, Segmentation fault.
#0  ___pthread_mutex_lock (mutex=0x7edff8d3f200) at ./nptl/pthread_mutex_lock.c:80
        type = <optimized out>
        __PRETTY_FUNCTION__ = "___pthread_mutex_lock"
        id = <optimized out>
#1  0x00007f0ab62091a6 in mutex_lock (m=0x7edff8d3f200) at ./mutex.h:21
No locals.
#2  0x00007f0ab620c9b5 in allocate_small (arena=4, requested_size=24) at h_malloc.c:517
        info = {size = 32, class = 2}
        size = 32
        c = 0x7edff8d3f200
        slots = 128
        slab_size = 4096
        metadata = 0x0
        slot = 0
        slab = 0x0
        p = 0x0
#3  0x00007f0ab6209809 in allocate (arena=4, size=24) at h_malloc.c:1252
No locals.
#4  0x00007f0ab6208e26 in realloc (old=0x72b138199120, size=24) at h_malloc.c:1499
        vma_merging_reliable = false
        old_size = 16
        new = 0x0
        copy_size = 139683981990973
#5  0x00007299f919e556 in attach_shader (ctx=0x7299e9ef9000, shProg=0x7370c9277d30, sh=0x7370c9278230) at ../src/mesa/main/shaderapi.c:336
        n = 1
#6  0x00007299f904223e in _mesa_unmarshal_AttachShader (ctx=<optimized out>, cmd=<optimized out>) at src/mapi/glapi/gen/marshal_generated2.c:1539
        program = <optimized out>
        shader = <optimized out>
        cmd_size = 2
#7  0x00007299f8f2e3b2 in glthread_unmarshal_batch (job=job@entry=0x7299e9ef9168, gdata=gdata@entry=0x0, thread_index=thread_index@entry=0) at ../src/mesa/main/glthread.c:139
        cmd = 0x7299e9ef9180
        batch = 0x7299e9ef9168
        ctx = 0x7299e9ef9000
        pos = 0
        used = 3
        buffer = 0x7299e9ef9180
        shared = <optimized out>
        lock_mutexes = <optimized out>
        batch_index = <optimized out>
#8  0x00007299f8ecc2d9 in util_queue_thread_func (input=input@entry=0x72c1160e5580) at ../src/util/u_queue.c:309
        job = {job = 0x7299e9ef9168, global_data = 0x0, job_size = 0, fence = 0x7299e9ef9168, execute = <optimized out>, cleanup = <optimized out>}
        queue = 0x7299e9ef9058
        thread_index = 0
#9  0x00007299f8f1bcbb in impl_thrd_routine (p=<optimized out>) at ../src/c11/impl/threads_posix.c:67
        pack = {func = 0x7299f8ecc190 <util_queue_thread_func>, arg = 0x72c1160e5580}
#10 0x00007f0ab5aa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139683974242608, 2767510063778797177, -168, 11, 140727286820160, 126005371879424, -4369625917767903623, -2847048016936659335}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0,
          0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#11 0x00007f0ab5b26a2c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

thestinger commented 1 year ago

Thanks. I initially misunderstood the issue you were fixing. This happens in a case where hardened_malloc is already globally initialized, the process obtains a small (slab) allocation from it and then calls realloc on that in a thread which hasn't made an allocation with malloc yet.

This was overlooked when adding support for arenas and GrapheneOS currently doesn't use arenas with hardened_malloc since it uses too much memory for our usage when slab allocation quarantines are enabled. It uses a per-arena-per-size-class quarantine and sets the quarantine size based on the largest slab allocation size class so having extended size classes enabled greatly increases the memory dedicated to quarantines. Providing more configuration for slab allocation size classes and potentially optimizing their substantial impact on performance and memory usage is becoming a priority.

thestinger commented 1 year ago

We'll tag a new release of the standalone hardened_malloc soon to get the fix to people using the stable releases. Would have done it already but things are not going particularly well in terms of the ongoing harassment, fabricated stories, swatting attacks, etc. and it's very difficult to get basic things done.

GrapheneOS / hardened_malloc

do not skip init() in realloc() #220