bitwiseworks / libc

LIBC Next (kLIBC fork)
9 stars 4 forks source link

LIBCx mmap tests fail with GCC4 build of LIBC #26

Closed dmik closed 5 years ago

dmik commented 5 years ago

Another problem. Need to investigate too as mmap is mission critical.

Note that to prove it's GCC4 again, I tested it with the GCC3 bulid of our github source tree — mmap works well. Looks similar to the Qt4 problem (see https://github.com/bitwiseworks/libc/issues/4#issuecomment-457362697).

dmik commented 5 years ago

This is is what I get from the GCC3 build (a controlled SIGSEGV):

*** 0001fbbe 0060:01 shared.c:327:shared_init: done
*** 0001fbbe 0060:01 mmap.c:2422:forkCompletion: restoring mapping 0x5eda43a8 (start 5ed9f000)
*** 0001fbbe 0060:01 mmap.c:2437:forkCompletion: new mmap 0x5eda43c4
Child started, value = 3
*** 0001fbbe 0060:01 mmap.c:1271:munmap: addr 0x5ed9f000, len 4
*** 0001fbbe 0060:01 mmap.c:1037:find_mmap: found m 0x5eda43c4 (5ed9f000..5eda0000 (4096), flags 1001=S--A, dos_flags 3, refcnt 0, fmem 0 (0))
*** 0001fbbe 005d:01 spawn2.c:1125:wait4: pid -1 piStatus 0x12fed8 fOptions 0x0 pRUsage 0
*** 0001fbbe 0060:01 mmap.c:1402:munmap: unmapping mmap 0x5eda43c4 (start 5ed9f000, end 5eda0000, refcnt 0)
*** 0001fbbe 0060:01 mmap.c:1219:free_mmap: shared mapping 0x5eda43c4 (5ed9f000..5eda0000)
*** 0001fbbe 0060:01 mmap.c:1642:mmap_exception: addr 5ed9f000, info 1
*** 0001fbbe 0060:01 mmap.c:1027:find_mmap: mapping not found
*** 0001fbbe 0060:01 mmap.c:1802:mmap_exception: not retrying
Creating 0060_01.TRP

Killed by SIGSEGV
pid=0x0060 ppid=0x005d tid=0x0001 slot=0x008b pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\LIBCX\MASTER-BUILD-DEBUG\STAGE\BIN\TST-ANON_MMAP.EXE
TST-ANON 0:00001505
cs:eip=005b:00011505      ss:esp=0053:0012feb0      ebp=0012fee8
 ds=0053      es=0053      fs=150b      gs=0000     efl=00010213
eax=5ed9f000 ebx=0012ff98 ecx=00140048 edx=0012fb24 edi=00000000 esi=00000000
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
*** 0001fcac 0060:01 shared.c:544:ProcessExit: reason 0
*** 0001fcac 0060:01 shared.c:337:shared_term: gMutex 800100e0, gpData 0x5eda0000 (heap 0x5eda003c, refcnt 2), gSeenAssertion 0
*** 0001fcac 0060:01 mmap.c:1563:mmap_term: DosCloseEventSem = 0
*** 0001fcae 0060:01 shared.c:378:shared_term: proc 0x5eda62e0
*** 0001fcae 0060:01 shared.c:402:shared_term: proc->files 0
*** 0001fcae 0060:01 shared.c:443:shared_term:

This is what the GCC4 build gives:

Child started, value = 3
*** 0000d103 005c:01 mmap.c:1271:munmap: addr 0x5ed9f000, len 4
*** 0000d103 005c:01 mmap.c:1037:find_mmap: found m 0x5eda43c4 (5ed9f000..5eda0000 (4096), flags 1001=S--A, dos_flags 3, refcnt 0, fmem 0 (0))
*** 0000d103 0059:01 spawn2.c:1125:wait4: pid -1 piStatus 0x12feb8 fOptions 0x0 pRUsage 0
*** 0000d103 005c:01 mmap.c:1402:munmap: unmapping mmap 0x5eda43c4 (start 5ed9f000, end 5eda0000, refcnt 0)
*** 0000d103 005c:01 mmap.c:1219:free_mmap: shared mapping 0x5eda43c4 (5ed9f000..5eda0000)
*** 0000d103 005c:01 mmap.c:1642:mmap_exception: addr 5ed9f000, info 1
*** 0000d103 005c:01 mmap.c:1027:find_mmap: mapping not found
*** 0000d103 005c:01 mmap.c:1802:mmap_exception: not retrying
Creating 005C_01.TRP
*** 0000d1cc 005c:01 mmap.c:1642:mmap_exception: addr ffffffff, info 0
*** 0000d1cc 005c:01 mmap.c:1027:find_mmap: mapping not found
*** 0000d1cc 005c:01 mmap.c:1802:mmap_exception: not retrying

Killed by SIGILL
pid=0x005c ppid=0x0059 tid=0x0001 slot=0x008b pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\LIBCX\MASTER-BUILD-DEBUG\STAGE\BIN\TST-ANON_MMAP.EXE
TST-ANON 2:000ff894
cs:eip=005b:0012f894      ss:esp=0053:0012f868      ebp=00000000
 ds=0053      es=0053      fs=150b      gs=0000     efl=00210212
eax=ffffffff ebx=00000000 ecx=0012fbaa edx=ffffffff edi=0012ff80 esi=33303164
Process dumping was disabled, use DUMPPROC / PROCDUMP to enable it.
*** 0000d1ce 005c:01 shared.c:544:ProcessExit: reason 0
*** 0000d1ce 005c:01 shared.c:337:shared_term: gMutex 800100eb, gpData 0x5eda0000 (heap 0x5eda003c, refcnt 2), gSeenAssertion 0
*** 0000d1ce 005c:01 mmap.c:1563:mmap_term: DosCloseEventSem = 0
*** 0000d1ce 005c:01 shared.c:378:shared_term: proc 0x5eda62e0
*** 0000d1ce 005c:01 shared.c:402:shared_term: proc->files 0
*** 0000d1ce 005c:01 shared.c:443:shared_term: 

As you can see, there is a second SIGILL (invalid instruction) assertion which happens somewhere in LIBC when processing the first assertion, as far as I see. I have no .TRP file for it as it seems that EXCEPTQ can't generate ones for nested exceptions (I need to check this with @StevenLevine).

dmik commented 5 years ago

Note that since it's a child process, it's not easy to connect to it with the debugger. LIBC logs show the following.

GCC3:

0001fcaa 01 01 0014 Entr 0000 __libc_Back_signalRaise: iSignalNo=11 pSigInfo=0x0012fb30{.si_signo=11, .si_errno=0, .si_code=0x80003002, .si_timestamp=0x0, .si_flags=0x0 .si_pid=0x0, .si_pgrp=0x0, .si_tid=0x0, .si_uid=0, .si_status=0, .si_addr=0x5ed9f000, .si_value=0x00000000, .si_band=0, .si_fd=0} pvXcptOrQueued=0x0012fba0 fFlags=0x5
0001fcac 01 02 0014 Entr 0000 __libc_back_signalSemRequest: 
0001fcac 01 03 0011 Entr 0000 __libc_Back_safesemMtxLock: pmtx=0x184ac308:{.hmtx=0x10009}
0001fcac 01 03 0011 Leav 0000 __libc_Back_safesemMtxLock (0 ms): ret 0 (0x0)
0001fcac 01 02 0014 Leav 0000 __libc_back_signalSemRequest (0 ms): ret 0 (0x0)
0001fcac 01 02 0014 Entr 0000 signalSchedule: pThrd=0x1849ad20 iSignalNo=11 pSigInfo=0x0012faa0 fFlags=0x5 pSigQueued=0x00000000
0001fcac 01 02 0014 Mesg 0000 signalSchedule (0 ms): enqueued signal 11.
0001fcac 01 02 0014 Mesg 0000 signalSchedule (0 ms): setting 11 pending on thread 0x1 and poking it.
0001fcac 01 03 0014 Entr 0000 signalScheduleSPM: pThrd=0x1849ad20
0001fcac 01 04 0018 Entr 0000 __libc_spmSigPending: pSigSet=0x0012f9d0
0001fcac 01 05 0018 Entr 0000 spmRequestMutex: 
0001fcac 01 05 0018 Leav 0000 spmRequestMutex (0 ms): ret 0 (0x0)
0001fcac 01 05 0018 Entr 0000 spmReleaseMutex: 
0001fcac 01 05 0018 Leav 0000 spmReleaseMutex (0 ms): ret 0 (0x0)
0001fcac 01 04 0018 Leav 0000 __libc_spmSigPending (0 ms): ret 0 *pSigSet={00000000 00000000}
0001fcac 01 03 0014 Leav 0000 signalScheduleSPM (0 ms): ret void
0001fcac 01 03 0014 Entr 0000 signalScheduleProcess: pThrd=0x1849ad20
0001fcac 01 04 000c Entr 0000 __libc_threadEnum: pfnCallback=0x1e065270 pvParam=0x1849ad20
0001fcac 01 04 000c Leav 0000 __libc_threadEnum (0 ms): ret 0 (0x0)
0001fcac 01 03 0014 Leav 0000 signalScheduleProcess (0 ms): ret void
0001fcac 01 03 0014 Entr 0000 signalDeliver: pThrd=0x1849ad20 pvXcptParams=0x0012fba0
0001fcac 01 04 0014 Entr 0000 __libc_back_signalAccept: pThrd=0x1849ad20 pSigSet=0x0012f9c0{0x0000000x000400} pSigInfo=0x0012f9f0
0001fcac 01 04 0014 Leav 0000 __libc_back_signalAccept (0 ms): ret 11 (0xb)
0001fcac 01 04 0014 Entr 0000 signalTerminateAbnormal: iSignalNo=11
0001fcac 01 05 0018 Entr 0000 __libc_spmTerm: enmDeathReason=43 iExitCode=127
0001fcac 01 06 0018 Entr 0000 spmRequestMutex: 
0001fcac 01 06 0018 Leav 0000 spmRequestMutex (0 ms): ret 0 (0x0)
0001fcac 01 06 0018 Entr 0000 spmReleaseMutex: 
0001fcac 01 06 0018 Leav 0000 spmReleaseMutex (0 ms): ret 0 (0x0)
0001fcac 01 05 0018 Leav 0000 __libc_spmTerm (0 ms): ret void
0001fcac 01 05 0000 Entr 0000 __libc_Back_panicV: fFlags=0x3 pvCtx=0x0012fd80 pszFormat=0x1e066082:{Killed by %s

GCC4:

0000d1cc 01 01 000d Leav 0000 _fmutex_dummy (0 ms): ret void
0000d1cc 01 01 000d Entr 0000 _fmutex_dummy: sem=0x0012f428
0000d1cc 01 01 000d Leav 0000 _fmutex_dummy (0 ms): ret void
0000d1cc 01 01 000d Entr 0000 _fmutex_dummy: sem=0x0012f428
0000d1cc 01 01 000d Leav 0000 _fmutex_dummy (0 ms): ret void
0000d1cc 01 01 000d Entr 0000 _fmutex_dummy: sem=0x0012f4c8
0000d1cc 01 01 000d Leav 0000 _fmutex_dummy (0 ms): ret void
0000d1cc 01 01 000d Entr 0000 _fmutex_dummy: sem=0x0012f4c8
0000d1cc 01 01 000d Leav 0000 _fmutex_dummy (0 ms): ret void
0000d1cc 01 01 0014 Entr 0000 __libc_Back_signalRaise: iSignalNo=4 pSigInfo=0x0012f4e8{.si_signo=4, .si_errno=0, .si_code=0x80001005, .si_timestamp=0x0, .si_flags=0x0 .si_pid=0x0, .si_pgrp=0x0, .si_tid=0x0, .si_uid=0, .si_status=0, .si_addr=0x0012f894, .si_value=0x00000000, .si_band=0, .si_fd=0} pvXcptOrQueued=0x0012f558 fFlags=0x5
0000d1cc 01 02 0014 Entr 0000 __libc_back_signalSemRequest: 
0000d1cc 01 03 0011 Entr 0000 __libc_Back_safesemMtxLock: pmtx=0x184aac48:{.hmtx=0x10009}
0000d1cc 01 03 0011 Leav 0000 __libc_Back_safesemMtxLock (0 ms): ret 0 (0x0)
0000d1cc 01 02 0014 Leav 0000 __libc_back_signalSemRequest (0 ms): ret 0 (0x0)
0000d1cc 01 02 0014 Entr 0000 signalSchedule: pThrd=0x184996e0 iSignalNo=4 pSigInfo=0x0012f468 fFlags=0x5 pSigQueued=0x00000000
0000d1cc 01 02 0014 Mesg 0000 signalSchedule (0 ms): enqueued signal 4.
0000d1cc 01 02 0014 Mesg 0000 signalSchedule (0 ms): setting 4 pending on thread 0x1 and poking it.
0000d1cc 01 03 0014 Entr 0000 signalScheduleSPM: pThrd=0x184996e0
0000d1cc 01 04 0018 Entr 0000 __libc_spmSigPending: pSigSet=0x0012f348
0000d1cc 01 05 0018 Entr 0000 spmRequestMutex: 
0000d1cc 01 05 0018 Leav 0000 spmRequestMutex (0 ms): ret 0 (0x0)
0000d1cc 01 05 0018 Entr 0000 spmReleaseMutex: 
0000d1cc 01 05 0018 Leav 0000 spmReleaseMutex (0 ms): ret 0 (0x0)
0000d1cc 01 04 0018 Leav 0000 __libc_spmSigPending (0 ms): ret 0 *pSigSet={00000000 00000000}
0000d1cc 01 03 0014 Leav 0000 signalScheduleSPM (0 ms): ret void
0000d1cc 01 03 0014 Entr 0000 signalScheduleProcess: pThrd=0x184996e0
0000d1cc 01 04 000c Entr 0000 __libc_threadEnum: pfnCallback=0x1df2b184 pvParam=0x184996e0
0000d1cc 01 04 000c Leav 0000 __libc_threadEnum (0 ms): ret 0 (0x0)
0000d1cc 01 03 0014 Leav 0000 signalScheduleProcess (0 ms): ret void
0000d1cc 01 03 0014 Entr 0000 signalDeliver: pThrd=0x184996e0 pvXcptParams=0x0012f558
0000d1cc 01 04 0014 Entr 0000 __libc_back_signalAccept: pThrd=0x184996e0 pSigSet=0x0012f2a0{0x0000000x000008} pSigInfo=0x0012f2b8
0000d1cc 01 04 0014 Leav 0000 __libc_back_signalAccept (0 ms): ret 4 (0x4)
0000d1cc 01 04 0014 Entr 0000 signalTerminateAbnormal: iSignalNo=4
0000d1cc 01 05 0018 Entr 0000 __libc_spmTerm: enmDeathReason=36 iExitCode=127
0000d1cc 01 06 0018 Entr 0000 spmRequestMutex: 
0000d1cc 01 06 0018 Leav 0000 spmRequestMutex (0 ms): ret 0 (0x0)
0000d1cc 01 06 0018 Entr 0000 spmReleaseMutex: 
0000d1cc 01 06 0018 Leav 0000 spmReleaseMutex (0 ms): ret 0 (0x0)
0000d1cc 01 05 0018 Leav 0000 __libc_spmTerm (0 ms): ret void
0000d1ce 01 05 0000 Entr 0000 __libc_Back_panicV: fFlags=0x3 pvCtx=0x0012f738 pszFormat=0x1df28f9e:{Killed by %s

My guess so far is that LIBC executes some invalid instruction when trying to raise signal 11 (SIGFPE) which results into signal 4 (SIGILL). Somehow GCC4 generates code LIBC does not expect...

dmik commented 5 years ago

I found the problem. GCC3 and GCC4 handle convesion of functions to bool differently. GCC3 acts as if the address of the function was taken and converted to bool, GCC4 always assumes that this conversion returns true and even warns about that in -Wall like

warning: the address of 'kHeapDbgException' will always evaluate as 'true' [-Waddress]

kHeapDbgException is a weak symbol used in the LIBC excpetion handler (exceptions.c) that, in particular, raises signals like SIGSEGV. The code checks if the weak symbol is not NULL and calls it. On GCC3 it worked as expected by the author: a missing weak symbol would appear as NULL and would not be called. However, on GCC4 the address of the symbol is never NULL (even though it will appear as 0 when printed via %p!) so the exception handler code would just call a function at address 0x00000000 causing a nested SIGILL.

The question is how to fix that. There doesn't seem to be a way to know programmatically if the weak symbol is null or not now.

dmik commented 5 years ago

See the commit. Why the previous version stated that #pragma weak wouldn't work in GCC due to underscores is not clear for me — the latest gcc335csd6 build works fine with #pragma by just inserting the very same .weak statement in the assembly (I checked). So perhaps it's either the EMX tools where it has been fixed since then, or it was a linker problem went away in recent WLINK builds.