Open dmik opened 1 year ago
The following crash happens during the build process in some tool called MKSNAPSHOT.EXE:
LD_LIBRARY_PATH=D:/Coding/node/main/out/Release/lib.host:D:/Coding/node/main/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../tools/v8_gypfiles; mkdir -p D:/Coding/node/main/out/Release/obj.target/v8_snapshot/geni; "D:/Coding/node/main/out/Release/mksnapshot" --turbo_instruction_scheduling "--target_os=os2" "--target_arch=ia32" --startup_src "D:/Coding/node/main/out/Release/obj.target/v8_snapshot/geni/snapshot.cc" --embedded_variant Default --embedded_src "D:/Coding/node/main/out/Release/obj.target/v8_snapshot/geni/embedded.S" --no-native-code-counters
Killed by SIGSEGV
pid=0x7610 ppid=0x760d tid=0x0001 slot=0x0080 pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE
MKSNAPSH 2:000f3510
cs:eip=0000:03c63510 ss:esp=0000:00000010 ebp=03c60053
ds=0000 es=0000 fs=03c62e80 gs=0000 efl=74000022
eax=03c62f88 ebx=1ffc9d7c ecx=03c62f9c edx=03c62fc0 edi=20030000 esi=03c6fe1c
Creating 7610_01.TRP
Moved 7610_01.TRP to C:\var\log\app\64431a2c-7610_01-MKSNAPSHOT-exceptq.txt
make: *** [tools/v8_gypfiles/v8_snapshot.target.mk:17: daf6e403889991d83bdc4e0ad8be81d51fd3c622.intermediate] Segmentation fault
make: *** Deleting file 'daf6e403889991d83bdc4e0ad8be81d51fd3c622.intermediate'
make: Leaving directory 'D:/Coding/node/main/out'
Interesting, mksnapshot from QtWebEngine 6.5.1 (with my current patchset) gives a SIGABRT and the attached trp 648ea753-11d4_01-MKSNAPSHOT-exceptq.txt
@psmedley I see, though this looks like something entirely different (some LIBC heap memory corruption). My crash is this one:
______________________________________________________________________
Exception C0000005 - Access Violation
______________________________________________________________________
Process: D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE (05/11/2023 03:34:17 70,835,511)
PID: 3F2 (1010)
TID: 01 (1)
Priority: 200
Filename: D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE (05/11/2023 03:34:17 70,835,511)
Address: 005B:000925C5 (0001:000825C5)
Cause: Attempted to write to 03B360A0
(read-only memory at 0002:000060A0 in MKSNAPSH)
______________________________________________________________________
Failing Instruction
______________________________________________________________________
000925B0 JNZ 0x925e0 (75 2e)
000925B2 MOV DWORD [ESP], 0x3b3d7ac (c70424 acd7b303)
000925B9 CALL 0x16c9730 (e8 72716301)
000925BE MOV DWORD [ESP], 0x3b3d7ac (c70424 acd7b303)
000925C5 >MOV BYTE [0x3b360a0], 0x0 (c605 a060b303 00)
000925CC CALL 0x16c9740 (e8 6f716301)
000925D1 ADD ESP, 0x24 (83c4 24)
000925D4 POP EBX (5b)
______________________________________________________________________
Registers
______________________________________________________________________
EAX : 00000000 EBX : 03C53440 ECX : 00000001 EDX : 00000000
ESI : 03C53510 EDI : 00000000
ESP : 03C53430 EBP : 03C53458 EIP : 000925C5 EFLG : 00010202
CS : 005B CSLIM: FFFFFFFF SS : 0053 SSLIM: FFFFFFFF
EAX : not a valid address
EBX : read/write memory on this thread's stack
ECX : not a valid address
EDX : not a valid address
ESI : read/write memory on this thread's stack
EDI : not a valid address
______________________________________________________________________
Stack Info for Thread 01
______________________________________________________________________
Size Base ESP Max Top
00100000 03C60000 -> 03C53430 -> 03C51000 -> 03B60000
______________________________________________________________________
Call Stack
______________________________________________________________________
EBP Address Module Obj:Offset Nearest Public Symbol
-------- --------- -------- ------------- -----------------------
Trap -> 000925C5 MKSNAPSH 0001:000825C5 __ZN2v88internal30DisableEmbeddedBlobRefcountingEv + 25 0001:000825A0 (isolate.obj)
03C53458 0001525A MKSNAPSH 0001:0000525A main + 51A 0001:00004D40 (ldconv_mksnapshot_o_6291645c38141ad570.obj)
03C5FDD8 00010027 MKSNAPSH 0001:00000027 crt0.s#90 __text + 27 0001:00000000 (D:\Temp\ccLyDNo2.s)
03C5FE04 1E17F621 LIBCX0 0001:0000F621 ___init_app + 11 0001:0000F610 (main.obj)
03C5FFE0 1DF13B3B LIBCN0 0001:00033B3B appinit.s#16 ___init_app + B 0001:00033B30 (appinit.obj)
Made it possible to rebuild MKSNAPSHOT.EXE quickly and with some more debugging I see that it crashes in an attempt to write to a global file-level variable (located in the data segment of this EXE) because this memory is somehow read-only. So far, I don't quite understand why this segment is read-only. It is marked in the EXE header as read/write. The only explanation to this strange fact that comes to my head is that some code dynamically makes the respective page read-only after the EXE is started. But it's unclear where and why it happens. Will have to debug DosSetMem calls in V8 it seems.
Actually, I'm right. Some more debugging shows that DosSetMem
is called exactly once on a single page (belonging to DATA32 segment of the EXE!) via v8::basw::OS::SetPermissions(OS::MemoryPermission::kRead)
. Why V8 does so it's not clear. Some more debugging is necessary. It's all slow though because if I change platform-os2.cc
for some reason the whole v8 library gets rebuilt (and it takes 4 hours here in -j1 mode and -j4 sometimes overheats the CPU).
Ok, now I know what's going on. V8 has a FlagValues
struct (accessed via v8_flags
). They declare this struct with strict page-size alignment (4096 bytes) using alignas
in flags.h
:
struct alignas(kMinimumOSPageSize) FlagValues {
...
They do it so that they could safely set this exact variable's memory protection mode to read-only (and this can only be done on a page granularity) with base::OS::SetDataReadOnly(&v8_flags, sizeof(v8_flags));
(which calls v8::basw::OS::SetPermissions(OS::MemoryPermission::kRead)
under the hood).
However, as https://github.com/bitwiseworks/gcc-os2/issues/11 suggests, our GCC (AS) lacks support for alignas
, basically because a.out
and omf
object file formats lack per-variable alignment support.
So, as a result, the v8_flags
variable ends up misaligned (starts at offset 0x0060
within a page rather than on a page boundary as debugging shows). Consequently, when DosSetMem
is called on that variable's address to set it to R/O, it changes two actual pages — the one where this variable starts and the next one. And since the next page happens to have other variables which are not intended to be read-only, the process crashes when the code tries to assign values to these variables.
Making alignas
work appears to be a huge task. We need some other solution. As a last resort, I will simply make v8::basw::OS::SetPermissions(OS::MemoryPermission::kRead)
a no-op (I see that there are some Chromium platforms that do that as well, e.g Fuchsia, though the reason is different there).
FYI I already have v8::base::OS::SetPermissions(OS::MemoryPermission::kRead) as a no-op in Qt 6.5 - might explain why I get a different TRP
Yes, I would expect that. I came up with a solution to work around alignas
limitations by padding the FlagValues
struct with 4096 bytes at the beginning and at the end and then aligning the address of the struct to the next page boundary before passing it to base::OS::SetDataReadOnly
. This works pretty well.
However, MKSNAPSHOT doesn't actually work either. It crashes here with SIGABRT due to OOM condition spitting this to the console:
<--- Last few GCs --->
<--- JS stacktrace --->
#
# Fatal process OOM in Zone
#
Killed by SIGTRAP
pid=0x6f04 ppid=0x6f03 tid=0x0001 slot=0x009e pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE
cs:eip=03c50d04:00000000 ss:esp=0000:00000000 ebp=00000053
ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
eax=03c4e4d8 ebx=1ffc9d7c ecx=03c4e4ec edx=03c4e510 edi=00000000 esi=03c5ff7c
Creating 6F04_01.TRP
Moved 6f04_01.TRP to C:\var\log\app\649f5b9a-6f04_01-MKSNAPSHOT-exceptq.txt
and the following stack trace:
EBP Address Module Obj:Offset Nearest Public Symbol
-------- --------- -------- ------------- -----------------------
Trap -> 00032DC0 MKSNAPSH 0001:00022DC0 __ZN2v84base2OS5AbortEv + 10 0001:00022DB0 (platform-posix.obj)
03C4E4E8 0005C7A3 MKSNAPSH 0001:0004C7A3 __ZN2v85Utils16ReportOOMFailureEPNS_8internal7IsolateEPKcRKNS_10OOMDetailsE + 93 0001:0004C710 (api.obj)
03C4E508 0005CA91 MKSNAPSH 0001:0004CA91 __ZN2v88internal2V823FatalProcessOutOfMemoryEPNS0_7IsolateEPKcRKNS_10OOMDetailsE + 251 0001:0004C840 (api.obj)
03C52D68 016FB19C MKSNAPSH 0001:016EB19C __ZN2v88internal4Zone9NewExpandEj + 10C 0001:016EB090 (zone.obj)
03C52D98 0177F2AF MKSNAPSH 0001:0176F2AF __ZN2v88internal8compiler7Linkage21GetStubCallDescriptorEPNS0_4ZoneERKNS0_23CallInterfaceDescriptorEiNS_4base5FlagsINS1_14CallDescriptor4FlagEiEENS9_INS1_8Operator8PropertyEhEENS0_12StubCallModeE + 34F 0001:0176EF60 (linkage.obj)
03C52DF8 0070BB43 MKSNAPSH 0001:006FBB43 __ZN2v88internal8compiler18CodeAssemblerStateC1EPNS0_7IsolateEPNS0_4ZoneERKNS0_23CallInterfaceDescriptorENS0_8CodeKindEPKcNS0_7BuiltinE + 53 0001:006FBAF0 (code-assembler.obj)
03C52E48 019FB43E MKSNAPSH 0001:019EB43E __ZN2v88internal13AssemblerBase21AbortedCodeGenerationEv$w$ReSlbtdjNe83mEIq0 + 5FE 0001:019EAE40 (setup-builtins-internal.obj)
03C52F68 01A0272A MKSNAPSH 0001:019F272A __ZN2v88internal20SetupIsolateDelegate21SetupBuiltinsInternalEPNS0_7IsolateE + EA 0001:019F2640 (setup-builtins-internal.obj)
03C52FA8 000A81C3 MKSNAPSH 0001:000981C3 __ZN2v88internal7Isolate4InitEPNS0_12SnapshotDataES3_S3_b + 1563 0001:00096C60 (isolate.obj)
03C53388 000A83F1 MKSNAPSH 0001:000983F1 __ZN2v88internal7Isolate19InitWithoutSnapshotEv + 31 0001:000983C0 (isolate.obj)
03C533C8 0005CC99 MKSNAPSH 0001:0004CC99 __ZN2v815SnapshotCreatorC1EPNS_7IsolateEPKiPNS_11StartupDataE + 89 0001:0004CC10 (api.obj)
03C533F8 000DDFA6 MKSNAPSH 0001:000CDFA6 __ZN2v88internal30CreateSnapshotDataBlobInternalENS_15SnapshotCreator20FunctionCodeHandlingEPKcPNS_7IsolateE + 36 0001:000CDF70 (snapshot.obj)
03C53458 000152DC MKSNAPSH 0001:000052DC main + 59C 0001:00004D40 (ldconv_mksnapshot_o_6ed4649f312c1d7938.obj)
03C5FDD8 00010027 MKSNAPSH 0001:00000027 crt0.s#90 __text + 27 0001:00000000 (D:\Temp\ccLyDNo2.s)
03C5FE04 1E17F621 LIBCX0 0001:0000F621 ___init_app + 11 0001:0000F610 (main.obj)
03C5FFE0 1DF13BFB LIBCN0 0001:00033BFB appinit.s#16 ___init_app + B 0001:00033BF0 (appinit.obj)
In your case it's something completely different.
Disregard the last crash, it was due to some commented out stuff which I did for catching the previous bug. Now, with debugging removed, it seems that I get essentially the same crash as @psmedley:
Assertion failed: _UM_LUMP_STATUS (olump) == _UMS_FREE, file libc/src/emx/src/lib/malloc/ialloc.c, line 116
Killed by SIGABRT
pid=0x1149 ppid=0x1148 tid=0x0001 slot=0x009b pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE
Creating 1149_01.TRP
Moved 1149_01.TRP to C:\var\log\app\64a49535-1149_01-MKSNAPSHOT-exceptq.txt
EBP Address Module Obj:Offset Nearest Public Symbol
-------- --------- -------- ------------- -----------------------
Trap -> 1DFDC726 LIBCN0 0001:000FC726 b_panic.c#538 ___libc_Back_panicV + 1C2D 0001:000FAAF9 (b_panic.obj)
03C62A58 1DFDCEF6 LIBCN0 0001:000FCEF6 b_panic.c#170 ___libc_Back_panic + 26 0001:000FCED0 (b_panic.obj)
03C62A78 1DF27BE5 LIBCN0 0001:00047BE5 signals.c#2015 ___libc_back_ghevWait + 10745 0003:000374A0 (signals.obj)
03C62A98 1DF2A2C3 LIBCN0 0001:0004A2C3 signals.c#1800 ___libc_back_signalAccept + 682 0001:00049C41 (signals.obj)
03C62BA8 1DF2A5DA LIBCN0 0001:0004A5DA signals.c#954 ___libc_Back_signalRaise + 129 0001:0004A4B1 (signals.obj)
03C62C48 1DF2B6B6 LIBCN0 0001:0004B6B6 b_signalSendPid.c#82 ___libc_Back_signalSendPid + 46 0001:0004B670 (b_signalSendPid.obj)
03C62C68 1DF59932 LIBCN0 0001:00079932 kill.c#76 __std_kill + 22 0001:00079910 (kill.obj)
03C62C88 1DF6302A LIBCN0 0001:0008302A raise.c#56 __std_raise + 1A 0001:00083010 (raise.obj)
03C62CA8 1DF4F57C LIBCN0 0001:0006F57C abort.c#62 __std_abort + C4 0001:0006F4B8 (abort.obj)
03C62CE8 1DF3A09A LIBCN0 0001:0005A09A assert.c#18 __assert + 42 0001:0005A058 (assert.obj)
03C62D18 1DEEDF14 LIBCN0 0001:0000DF14 ialloc.c#116 __um_lump_alloc - 468 0001:0000E37C (libc\src\emx\src\lib\malloc\ialloc.c)
03C62D78 1DEEE5A4 LIBCN0 0001:0000E5A4 ialloc.c#296 __um_alloc_no_lock + 178 0001:0000E42C (libc\src\emx\src\lib\malloc\ialloc.c)
03C62DB8 1DEEDA45 LIBCN0 0001:0000DA45 umalloc.c#30 __umalloc + 81 0001:0000D9C4 (libc\src\emx\src\lib\malloc\umalloc.c)
03C62DE8 1DEF89F7 LIBCN0 0001:000189F7 malloc.c#25 __std_malloc + 27 0001:000189D0 (libc\src\emx\src\lib\malloc\malloc.c)
03C62E08 00472D8B MKSNAPSH 0001:00462D8B __ZN2v88internal18MarkingVisitorBaseINS0_18MainMarkingVisitorINS0_12MarkingStateEEES3_E15VisitMapPointerENS0_10HeapObjectE$w$WNex9GxJTe8qHWjz2 + 1FB 0001:00462B90 (mark-compact.obj)
03C62E48 004A369D MKSNAPSH 0001:0049369D __ZN2v88internal20MarkCompactCollector22ProcessMarkingWorklistEjNS1_29MarkingWorklistProcessingModeE + 207D 0001:00491620 (mark-compact.obj)
03C62EF8 004A4C3A MKSNAPSH 0001:00494C3A __ZN2v88internal20MarkCompactCollector17ProcessEphemeronsEv + 8A 0001:00494BB0 (mark-compact.obj)
03C62F48 004A4FBF MKSNAPSH 0001:00494FBF __ZN2v88internal20MarkCompactCollector34MarkTransitiveClosureUntilFixpointEv + 12F 0001:00494E90 (mark-compact.obj)
03C62FE8 004AA5C2 MKSNAPSH 0001:0049A5C2 __ZN2v88internal20MarkCompactCollector21MarkTransitiveClosureEv + 32 0001:0049A590 (mark-compact.obj)
03C63028 004AB0B9 MKSNAPSH 0001:0049B0B9 __ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv + 849 0001:0049A870 (mark-compact.obj)
03C63138 004B3C12 MKSNAPSH 0001:004A3C12 __ZN2v88internal20MarkCompactCollector14CollectGarbageEv + 12 0001:004A3C00 (mark-compact.obj)
03C63158 000D30AD MKSNAPSH 0001:000C30AD __ZN2v88internal4Heap11MarkCompactEv + 16D 0001:000C2F40 (heap.obj)
03C631B8 000D4878 MKSNAPSH 0001:000C4878 __ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE + A38 0001:000C3E40 (heap.obj)
03C632B8 000D532F MKSNAPSH 0001:000C532F __ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE + 27F 0001:000C50B0 (heap.obj)
03C633D8 000D7A0E MKSNAPSH 0001:000C7A0E __ZN2v88internal4Heap26CollectAllAvailableGarbageENS0_23GarbageCollectionReasonE + 7E 0001:000C7990 (heap.obj)
03C63498 0008BD72 MKSNAPSH 0001:0007BD72 __ZN2v815SnapshotCreator10CreateBlobENS0_20FunctionCodeHandlingE + 262 0001:0007BB10 (api.obj)
03C63558 000DDEF1 MKSNAPSH 0001:000CDEF1 __ZN2v88internal30CreateSnapshotDataBlobInternalENS_15SnapshotCreator20FunctionCodeHandlingEPKcPNS_7IsolateE + D1 0001:000CDE20 (snapshot.obj)
03C635B8 000152DC MKSNAPSH 0001:000052DC main + 59C 0001:00004D40 (ldconv_mksnapshot_o_112864a46af2136718.obj)
03C6FF38 00010027 MKSNAPSH 0001:00000027 crt0.s#90 __text + 27 0001:00000000 (D:\Temp\ccLyDNo2.s)
03C6FF64 1E17F621 LIBCX0 0001:0000F621 ___init_app + 11 0001:0000F610 (main.obj)
03C6FFE0 1DF13BFB LIBCN0 0001:00033BFB appinit.s#16 ___init_app + B 0001:00033BF0 (appinit.obj)
Probably not the cause of the issue, but platform_os2.cc needs:
case OS::MemoryPermission::kNoAccessWillJitLater:
added at https://github.com/bitwiseworks/node-os2/blob/main/deps/v8/src/base/platform/platform-os2.cc#L25 to bring it into line with other OS's - ie https://github.com/bitwiseworks/node-os2/blob/main/deps/v8/src/base/platform/platform-os2.cc#L25
This issue is a spin-off of https://github.com/bitwiseworks/qtwebengine-chromium-os2/issues/62.
This task is actually mostly done (like 90%), but there are some crashes to fix.