bitwiseworks / node-os2

Port of NodeJS to OS/2
Other
1 stars 0 forks source link

Build v19 on OS/2 #1

Open dmik opened 1 year ago

dmik commented 1 year ago

This issue is a spin-off of https://github.com/bitwiseworks/qtwebengine-chromium-os2/issues/62.

This task is actually mostly done (like 90%), but there are some crashes to fix.

dmik commented 1 year ago

The following crash happens during the build process in some tool called MKSNAPSHOT.EXE:

LD_LIBRARY_PATH=D:/Coding/node/main/out/Release/lib.host:D:/Coding/node/main/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../tools/v8_gypfiles; mkdir -p D:/Coding/node/main/out/Release/obj.target/v8_snapshot/geni; "D:/Coding/node/main/out/Release/mksnapshot" --turbo_instruction_scheduling "--target_os=os2" "--target_arch=ia32" --startup_src "D:/Coding/node/main/out/Release/obj.target/v8_snapshot/geni/snapshot.cc" --embedded_variant Default --embedded_src "D:/Coding/node/main/out/Release/obj.target/v8_snapshot/geni/embedded.S" --no-native-code-counters

Killed by SIGSEGV
pid=0x7610 ppid=0x760d tid=0x0001 slot=0x0080 pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE
MKSNAPSH 2:000f3510
cs:eip=0000:03c63510      ss:esp=0000:00000010      ebp=03c60053
 ds=0000      es=0000      fs=03c62e80      gs=0000     efl=74000022
eax=03c62f88 ebx=1ffc9d7c ecx=03c62f9c edx=03c62fc0 edi=20030000 esi=03c6fe1c
Creating 7610_01.TRP
Moved 7610_01.TRP to C:\var\log\app\64431a2c-7610_01-MKSNAPSHOT-exceptq.txt
make: *** [tools/v8_gypfiles/v8_snapshot.target.mk:17: daf6e403889991d83bdc4e0ad8be81d51fd3c622.intermediate] Segmentation fault
make: *** Deleting file 'daf6e403889991d83bdc4e0ad8be81d51fd3c622.intermediate'
make: Leaving directory 'D:/Coding/node/main/out'
psmedley commented 1 year ago

Interesting, mksnapshot from QtWebEngine 6.5.1 (with my current patchset) gives a SIGABRT and the attached trp 648ea753-11d4_01-MKSNAPSHOT-exceptq.txt

dmik commented 1 year ago

@psmedley I see, though this looks like something entirely different (some LIBC heap memory corruption). My crash is this one:

______________________________________________________________________

 Exception C0000005 - Access Violation
______________________________________________________________________

 Process:  D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE (05/11/2023 03:34:17 70,835,511)
 PID:      3F2 (1010)
 TID:      01 (1)
 Priority: 200

 Filename: D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE (05/11/2023 03:34:17 70,835,511)
 Address:  005B:000925C5 (0001:000825C5)
 Cause:    Attempted to write to 03B360A0
           (read-only  memory at 0002:000060A0 in MKSNAPSH)

______________________________________________________________________

 Failing Instruction
______________________________________________________________________

 000925B0  JNZ  0x925e0                 (75 2e)
 000925B2  MOV  DWORD [ESP], 0x3b3d7ac  (c70424 acd7b303)
 000925B9  CALL 0x16c9730               (e8 72716301)
 000925BE  MOV  DWORD [ESP], 0x3b3d7ac  (c70424 acd7b303)
 000925C5 >MOV  BYTE [0x3b360a0], 0x0   (c605 a060b303 00)
 000925CC  CALL 0x16c9740               (e8 6f716301)
 000925D1  ADD  ESP, 0x24               (83c4 24)
 000925D4  POP  EBX                     (5b)

______________________________________________________________________

 Registers
______________________________________________________________________

 EAX : 00000000   EBX  : 03C53440   ECX : 00000001   EDX  : 00000000
 ESI : 03C53510   EDI  : 00000000
 ESP : 03C53430   EBP  : 03C53458   EIP : 000925C5   EFLG : 00010202
 CS  : 005B       CSLIM: FFFFFFFF   SS  : 0053       SSLIM: FFFFFFFF

 EAX : not a valid address
 EBX : read/write memory on this thread's stack
 ECX : not a valid address
 EDX : not a valid address
 ESI : read/write memory on this thread's stack
 EDI : not a valid address

______________________________________________________________________

 Stack Info for Thread 01
______________________________________________________________________

   Size       Base        ESP         Max         Top
 00100000   03C60000 -> 03C53430 -> 03C51000 -> 03B60000

______________________________________________________________________

 Call Stack
______________________________________________________________________

   EBP     Address    Module     Obj:Offset    Nearest Public Symbol
 --------  ---------  --------  -------------  -----------------------
 Trap  ->  000925C5   MKSNAPSH  0001:000825C5   __ZN2v88internal30DisableEmbeddedBlobRefcountingEv + 25 0001:000825A0 (isolate.obj)

 03C53458  0001525A   MKSNAPSH  0001:0000525A   main + 51A 0001:00004D40 (ldconv_mksnapshot_o_6291645c38141ad570.obj)

 03C5FDD8  00010027   MKSNAPSH  0001:00000027  crt0.s#90 __text + 27 0001:00000000 (D:\Temp\ccLyDNo2.s)

 03C5FE04  1E17F621   LIBCX0    0001:0000F621   ___init_app + 11 0001:0000F610 (main.obj)

 03C5FFE0  1DF13B3B   LIBCN0    0001:00033B3B  appinit.s#16 ___init_app + B 0001:00033B30 (appinit.obj)
dmik commented 1 year ago

Made it possible to rebuild MKSNAPSHOT.EXE quickly and with some more debugging I see that it crashes in an attempt to write to a global file-level variable (located in the data segment of this EXE) because this memory is somehow read-only. So far, I don't quite understand why this segment is read-only. It is marked in the EXE header as read/write. The only explanation to this strange fact that comes to my head is that some code dynamically makes the respective page read-only after the EXE is started. But it's unclear where and why it happens. Will have to debug DosSetMem calls in V8 it seems.

dmik commented 1 year ago

Actually, I'm right. Some more debugging shows that DosSetMem is called exactly once on a single page (belonging to DATA32 segment of the EXE!) via v8::basw::OS::SetPermissions(OS::MemoryPermission::kRead). Why V8 does so it's not clear. Some more debugging is necessary. It's all slow though because if I change platform-os2.cc for some reason the whole v8 library gets rebuilt (and it takes 4 hours here in -j1 mode and -j4 sometimes overheats the CPU).

dmik commented 1 year ago

Ok, now I know what's going on. V8 has a FlagValues struct (accessed via v8_flags). They declare this struct with strict page-size alignment (4096 bytes) using alignas in flags.h:

struct alignas(kMinimumOSPageSize) FlagValues {
...

They do it so that they could safely set this exact variable's memory protection mode to read-only (and this can only be done on a page granularity) with base::OS::SetDataReadOnly(&v8_flags, sizeof(v8_flags)); (which calls v8::basw::OS::SetPermissions(OS::MemoryPermission::kRead) under the hood).

However, as https://github.com/bitwiseworks/gcc-os2/issues/11 suggests, our GCC (AS) lacks support for alignas, basically because a.out and omf object file formats lack per-variable alignment support.

So, as a result, the v8_flags variable ends up misaligned (starts at offset 0x0060 within a page rather than on a page boundary as debugging shows). Consequently, when DosSetMem is called on that variable's address to set it to R/O, it changes two actual pages — the one where this variable starts and the next one. And since the next page happens to have other variables which are not intended to be read-only, the process crashes when the code tries to assign values to these variables.

Making alignas work appears to be a huge task. We need some other solution. As a last resort, I will simply make v8::basw::OS::SetPermissions(OS::MemoryPermission::kRead) a no-op (I see that there are some Chromium platforms that do that as well, e.g Fuchsia, though the reason is different there).

psmedley commented 1 year ago

FYI I already have v8::base::OS::SetPermissions(OS::MemoryPermission::kRead) as a no-op in Qt 6.5 - might explain why I get a different TRP

dmik commented 1 year ago

Yes, I would expect that. I came up with a solution to work around alignas limitations by padding the FlagValues struct with 4096 bytes at the beginning and at the end and then aligning the address of the struct to the next page boundary before passing it to base::OS::SetDataReadOnly. This works pretty well.

However, MKSNAPSHOT doesn't actually work either. It crashes here with SIGABRT due to OOM condition spitting this to the console:

<--- Last few GCs --->

<--- JS stacktrace --->

#
# Fatal process OOM in Zone
#

Killed by SIGTRAP
pid=0x6f04 ppid=0x6f03 tid=0x0001 slot=0x009e pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE
cs:eip=03c50d04:00000000      ss:esp=0000:00000000      ebp=00000053
 ds=0000      es=0000      fs=0000      gs=0000     efl=00000000
eax=03c4e4d8 ebx=1ffc9d7c ecx=03c4e4ec edx=03c4e510 edi=00000000 esi=03c5ff7c
Creating 6F04_01.TRP
Moved 6f04_01.TRP to C:\var\log\app\649f5b9a-6f04_01-MKSNAPSHOT-exceptq.txt

and the following stack trace:

   EBP     Address    Module     Obj:Offset    Nearest Public Symbol
 --------  ---------  --------  -------------  -----------------------
 Trap  ->  00032DC0   MKSNAPSH  0001:00022DC0   __ZN2v84base2OS5AbortEv + 10 0001:00022DB0 (platform-posix.obj)

 03C4E4E8  0005C7A3   MKSNAPSH  0001:0004C7A3   __ZN2v85Utils16ReportOOMFailureEPNS_8internal7IsolateEPKcRKNS_10OOMDetailsE + 93 0001:0004C710 (api.obj)

 03C4E508  0005CA91   MKSNAPSH  0001:0004CA91   __ZN2v88internal2V823FatalProcessOutOfMemoryEPNS0_7IsolateEPKcRKNS_10OOMDetailsE + 251 0001:0004C840 (api.obj)

 03C52D68  016FB19C   MKSNAPSH  0001:016EB19C   __ZN2v88internal4Zone9NewExpandEj + 10C 0001:016EB090 (zone.obj)

 03C52D98  0177F2AF   MKSNAPSH  0001:0176F2AF   __ZN2v88internal8compiler7Linkage21GetStubCallDescriptorEPNS0_4ZoneERKNS0_23CallInterfaceDescriptorEiNS_4base5FlagsINS1_14CallDescriptor4FlagEiEENS9_INS1_8Operator8PropertyEhEENS0_12StubCallModeE + 34F 0001:0176EF60 (linkage.obj)

 03C52DF8  0070BB43   MKSNAPSH  0001:006FBB43   __ZN2v88internal8compiler18CodeAssemblerStateC1EPNS0_7IsolateEPNS0_4ZoneERKNS0_23CallInterfaceDescriptorENS0_8CodeKindEPKcNS0_7BuiltinE + 53 0001:006FBAF0 (code-assembler.obj)

 03C52E48  019FB43E   MKSNAPSH  0001:019EB43E   __ZN2v88internal13AssemblerBase21AbortedCodeGenerationEv$w$ReSlbtdjNe83mEIq0 + 5FE 0001:019EAE40 (setup-builtins-internal.obj)

 03C52F68  01A0272A   MKSNAPSH  0001:019F272A   __ZN2v88internal20SetupIsolateDelegate21SetupBuiltinsInternalEPNS0_7IsolateE + EA 0001:019F2640 (setup-builtins-internal.obj)

 03C52FA8  000A81C3   MKSNAPSH  0001:000981C3   __ZN2v88internal7Isolate4InitEPNS0_12SnapshotDataES3_S3_b + 1563 0001:00096C60 (isolate.obj)

 03C53388  000A83F1   MKSNAPSH  0001:000983F1   __ZN2v88internal7Isolate19InitWithoutSnapshotEv + 31 0001:000983C0 (isolate.obj)

 03C533C8  0005CC99   MKSNAPSH  0001:0004CC99   __ZN2v815SnapshotCreatorC1EPNS_7IsolateEPKiPNS_11StartupDataE + 89 0001:0004CC10 (api.obj)

 03C533F8  000DDFA6   MKSNAPSH  0001:000CDFA6   __ZN2v88internal30CreateSnapshotDataBlobInternalENS_15SnapshotCreator20FunctionCodeHandlingEPKcPNS_7IsolateE + 36 0001:000CDF70 (snapshot.obj)

 03C53458  000152DC   MKSNAPSH  0001:000052DC   main + 59C 0001:00004D40 (ldconv_mksnapshot_o_6ed4649f312c1d7938.obj)

 03C5FDD8  00010027   MKSNAPSH  0001:00000027  crt0.s#90 __text + 27 0001:00000000 (D:\Temp\ccLyDNo2.s)

 03C5FE04  1E17F621   LIBCX0    0001:0000F621   ___init_app + 11 0001:0000F610 (main.obj)

 03C5FFE0  1DF13BFB   LIBCN0    0001:00033BFB  appinit.s#16 ___init_app + B 0001:00033BF0 (appinit.obj)

In your case it's something completely different.

dmik commented 1 year ago

Disregard the last crash, it was due to some commented out stuff which I did for catching the previous bug. Now, with debugging removed, it seems that I get essentially the same crash as @psmedley:

Assertion failed: _UM_LUMP_STATUS (olump) == _UMS_FREE, file libc/src/emx/src/lib/malloc/ialloc.c, line 116

Killed by SIGABRT
pid=0x1149 ppid=0x1148 tid=0x0001 slot=0x009b pri=0x0200 mc=0x0001 ps=0x0010
D:\CODING\NODE\MAIN\OUT\RELEASE\MKSNAPSHOT.EXE
Creating 1149_01.TRP
Moved 1149_01.TRP to C:\var\log\app\64a49535-1149_01-MKSNAPSHOT-exceptq.txt
   EBP     Address    Module     Obj:Offset    Nearest Public Symbol
 --------  ---------  --------  -------------  -----------------------
 Trap  ->  1DFDC726   LIBCN0    0001:000FC726  b_panic.c#538 ___libc_Back_panicV + 1C2D 0001:000FAAF9 (b_panic.obj)
 03C62A58  1DFDCEF6   LIBCN0    0001:000FCEF6  b_panic.c#170 ___libc_Back_panic + 26 0001:000FCED0 (b_panic.obj)
 03C62A78  1DF27BE5   LIBCN0    0001:00047BE5  signals.c#2015 ___libc_back_ghevWait + 10745 0003:000374A0 (signals.obj)
 03C62A98  1DF2A2C3   LIBCN0    0001:0004A2C3  signals.c#1800 ___libc_back_signalAccept + 682 0001:00049C41 (signals.obj)
 03C62BA8  1DF2A5DA   LIBCN0    0001:0004A5DA  signals.c#954 ___libc_Back_signalRaise + 129 0001:0004A4B1 (signals.obj)
 03C62C48  1DF2B6B6   LIBCN0    0001:0004B6B6  b_signalSendPid.c#82 ___libc_Back_signalSendPid + 46 0001:0004B670 (b_signalSendPid.obj)
 03C62C68  1DF59932   LIBCN0    0001:00079932  kill.c#76 __std_kill + 22 0001:00079910 (kill.obj)
 03C62C88  1DF6302A   LIBCN0    0001:0008302A  raise.c#56 __std_raise + 1A 0001:00083010 (raise.obj)
 03C62CA8  1DF4F57C   LIBCN0    0001:0006F57C  abort.c#62 __std_abort + C4 0001:0006F4B8 (abort.obj)
 03C62CE8  1DF3A09A   LIBCN0    0001:0005A09A  assert.c#18 __assert + 42 0001:0005A058 (assert.obj)
 03C62D18  1DEEDF14   LIBCN0    0001:0000DF14  ialloc.c#116 __um_lump_alloc - 468 0001:0000E37C (libc\src\emx\src\lib\malloc\ialloc.c)
 03C62D78  1DEEE5A4   LIBCN0    0001:0000E5A4  ialloc.c#296 __um_alloc_no_lock + 178 0001:0000E42C (libc\src\emx\src\lib\malloc\ialloc.c)
 03C62DB8  1DEEDA45   LIBCN0    0001:0000DA45  umalloc.c#30 __umalloc + 81 0001:0000D9C4 (libc\src\emx\src\lib\malloc\umalloc.c)
 03C62DE8  1DEF89F7   LIBCN0    0001:000189F7  malloc.c#25 __std_malloc + 27 0001:000189D0 (libc\src\emx\src\lib\malloc\malloc.c)
 03C62E08  00472D8B   MKSNAPSH  0001:00462D8B   __ZN2v88internal18MarkingVisitorBaseINS0_18MainMarkingVisitorINS0_12MarkingStateEEES3_E15VisitMapPointerENS0_10HeapObjectE$w$WNex9GxJTe8qHWjz2 + 1FB 0001:00462B90 (mark-compact.obj)
 03C62E48  004A369D   MKSNAPSH  0001:0049369D   __ZN2v88internal20MarkCompactCollector22ProcessMarkingWorklistEjNS1_29MarkingWorklistProcessingModeE + 207D 0001:00491620 (mark-compact.obj)
 03C62EF8  004A4C3A   MKSNAPSH  0001:00494C3A   __ZN2v88internal20MarkCompactCollector17ProcessEphemeronsEv + 8A 0001:00494BB0 (mark-compact.obj)
 03C62F48  004A4FBF   MKSNAPSH  0001:00494FBF   __ZN2v88internal20MarkCompactCollector34MarkTransitiveClosureUntilFixpointEv + 12F 0001:00494E90 (mark-compact.obj)
 03C62FE8  004AA5C2   MKSNAPSH  0001:0049A5C2   __ZN2v88internal20MarkCompactCollector21MarkTransitiveClosureEv + 32 0001:0049A590 (mark-compact.obj)
 03C63028  004AB0B9   MKSNAPSH  0001:0049B0B9   __ZN2v88internal20MarkCompactCollector15MarkLiveObjectsEv + 849 0001:0049A870 (mark-compact.obj)
 03C63138  004B3C12   MKSNAPSH  0001:004A3C12   __ZN2v88internal20MarkCompactCollector14CollectGarbageEv + 12 0001:004A3C00 (mark-compact.obj)
 03C63158  000D30AD   MKSNAPSH  0001:000C30AD   __ZN2v88internal4Heap11MarkCompactEv + 16D 0001:000C2F40 (heap.obj)
 03C631B8  000D4878   MKSNAPSH  0001:000C4878   __ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE + A38 0001:000C3E40 (heap.obj)
 03C632B8  000D532F   MKSNAPSH  0001:000C532F   __ZN2v88internal4Heap14CollectGarbageENS0_15AllocationSpaceENS0_23GarbageCollectionReasonENS_15GCCallbackFlagsE + 27F 0001:000C50B0 (heap.obj)
 03C633D8  000D7A0E   MKSNAPSH  0001:000C7A0E   __ZN2v88internal4Heap26CollectAllAvailableGarbageENS0_23GarbageCollectionReasonE + 7E 0001:000C7990 (heap.obj)
 03C63498  0008BD72   MKSNAPSH  0001:0007BD72   __ZN2v815SnapshotCreator10CreateBlobENS0_20FunctionCodeHandlingE + 262 0001:0007BB10 (api.obj)
 03C63558  000DDEF1   MKSNAPSH  0001:000CDEF1   __ZN2v88internal30CreateSnapshotDataBlobInternalENS_15SnapshotCreator20FunctionCodeHandlingEPKcPNS_7IsolateE + D1 0001:000CDE20 (snapshot.obj)
 03C635B8  000152DC   MKSNAPSH  0001:000052DC   main + 59C 0001:00004D40 (ldconv_mksnapshot_o_112864a46af2136718.obj)
 03C6FF38  00010027   MKSNAPSH  0001:00000027  crt0.s#90 __text + 27 0001:00000000 (D:\Temp\ccLyDNo2.s)
 03C6FF64  1E17F621   LIBCX0    0001:0000F621   ___init_app + 11 0001:0000F610 (main.obj)
 03C6FFE0  1DF13BFB   LIBCN0    0001:00033BFB  appinit.s#16 ___init_app + B 0001:00033BF0 (appinit.obj)
psmedley commented 1 month ago

Probably not the cause of the issue, but platform_os2.cc needs: case OS::MemoryPermission::kNoAccessWillJitLater: added at https://github.com/bitwiseworks/node-os2/blob/main/deps/v8/src/base/platform/platform-os2.cc#L25 to bring it into line with other OS's - ie https://github.com/bitwiseworks/node-os2/blob/main/deps/v8/src/base/platform/platform-os2.cc#L25