bitwiseworks / libc

LIBC Next (kLIBC fork)
9 stars 4 forks source link

Infinite loop in __libc_atexit_new #25

Closed dmik closed 5 years ago

dmik commented 5 years ago

From https://github.com/bitwiseworks/libc/issues/4#issuecomment-456932815:

What I see in the LIBC logs so far is that the Qt4 app (e.g. qedit.exe) when run against the GCC4 LIBC build enters _std_atexit on thread 1, then calls _hcalloc with size=1032 and that's all. The next record is the select timeout on thread 2 after 30 seconds and thread 1 never returns to _std_atexit or makes any other LIBC call.

In case of the old GCC3 build of LIBC, the very same _std_atexit callback returns almost immediately w/o even calling _hcalloc and execution continues.

I have no idea yet where this callback is coming from, who's calling calloc and why it all hangs.

and

I can confirm that this GCC4 that makes it hang. Rebuilding the same source tree with GCC3 makes it work fine. I have to play with the optimization options and dig into the assembly... The good thing is that I know the guilty piece of code.

dmik commented 5 years ago

There was a tiny bug in the code (false negative) that caused the cycle to spin forever. Fixed by the above commit. What puzzles me now is how it could ever work in GCC3. I've checked the assembly, it's essentially the same to the one generated by GCC4.

Note that this (previously faulty) code gets executed only when _atexit and/or on_exit run out of the initial number of the callback array (64) and want to register some more. So the only thing that comes to my mind is that for some reason GCC3 builds of LIBC register less atexit callbacks than those built with GCC4. Crazy but who knows. I'll try to check that with logging. And will also give Knut a link to this. May be something pops up in his head.

dmik commented 5 years ago

Okay, now I know exactly what's going on. Compare the GCC3 assembly (faulty code, w/o my fix):

    call    __hcalloc
    movl    %eax, %ecx
    addl    $16, %esp
    xorl    %eax, %eax
    testl   %ecx, %ecx
    je  L19

    movl    $1, 4(%ecx)
    movl    %ebx, 12(%ecx)
    movl    $3, 8(%ecx)

    movl    ___libc_gAtExitHead, %edx
L33:
    movl    %edx, (%ecx)
    movl    (%ecx), %eax

    lock; cmpxchgl %ecx, ___libc_gAtExitHead
    setz  %al
    movzx %al, %eax

    testl   %eax, %eax
    jne L33

and the GCC4 assembly (same faulty code):

    call    __hcalloc
    movl    %eax, %edx
    testl   %eax, %eax
    je  L23

    movl    $1, 4(%eax)
    movl    %ebx, 12(%eax)
    movl    $3, 8(%eax)

L10:
    movl    ___libc_gAtExitHead, %eax
    movl    %eax, (%edx)
    movl    (%edx), %eax

    lock; cmpxchgl %edx, ___libc_gAtExitHead
    setz  %al
    movzx %al, %eax

    testl   %eax, %eax
    jne L10

GCC3 doesn't reload ___libc_gAtExitHead in the loop because it (mistakenly) thinks its value never changes within the loop. As a result, the second loop iteration breaks it since __atomic_cmpxchg32 returns false as the new value of ___libc_gAtExitHead doesn't match the initial one (stored in EDX).

GCC4, however, properly guesses that ___libc_gAtExitHead might be changed because it is involved in assembly marked with volatile and reloads it at the beginning of each iteration. SInce at the second iteration the new value will always match what was set on the previous iteration, __atomic_cmpxchg32 will always return true and the loop will never end. Hence the hang at startup.

So it's a really weird combination of the program bug and the compiler bug that canceled each other and all accidentially worked. Having the compiler bug went away, it broke. Cool, I like such things.

Anyway, case closed. The fix makes it work in GCC4. What about GCC3, this fix actually creates the opposite potential problem: if the thread is not able to set ___libc_gAtExitHead at the first attempt because some other thread was faster, it will hang forever. This never happened in the past because it's a very rare case. Atexit handlers are usually installed at startup when there are not many threads.

BTW, JFTR, I measured: a Qt4 application installs ~90 atexit callbacks (i.e. more than the initial room for 64) regardless of GCC3 or GCC4. Regular applications install less than 64. And this explains why only Qt4 apps would hang with GCC4 builds of LIBC prior to the fix.