Open shabiel opened 6 years ago
Not sure if it helps but maybe https://github.com/YottaDB/YottaDB/issues/275 is related.
Interesting. I want to compile the latest source code for YottaDB on my Linux machine and see if we have the same issue.
Shouldn't be a surprise, but I at least confirmed it's not an issue on Linux.
@nars1, any advice on debugging this? The multiple forks make it difficult. The way I debugged gtmshrsec on Cygwin was to put in sleeps and then run and attach to the process while it's sleeping.
Found the crash.
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff687cd4aa __kill + 10
1 libyottadb.dylib 0x0000000105b68714 gtm_dump_core + 1332 (gtm_dump_core.c:69)
2 libyottadb.dylib 0x0000000105b6d981 gtm_fork_n_core + 2241
3 libyottadb.dylib 0x0000000105ae9ebb ch_cond_core + 475
4 libyottadb.dylib 0x0000000105ea9d45 rts_error_va + 3333
5 libyottadb.dylib 0x0000000105eaa307 rts_error_csa + 359
6 libyottadb.dylib 0x0000000105e455d0 middle_child + 1168 (ojstartchild.c:187)
7 libyottadb.dylib 0x0000000105eaa0b9 rts_error_va + 4217 (rts_error.c:160)
8 libyottadb.dylib 0x0000000105eaa307 rts_error_csa + 359
9 libyottadb.dylib 0x0000000105e3fd88 ojstartchild + 19000 (ojstartchild.c:612)
10 libyottadb.dylib 0x0000000105e64c17 op_job + 4279 (op_job.c:190)
11 ??? 0x000000010b9575b0 0 + 4489311664
Crashes here:
SEND(setup_fds[0], ¶ms, SIZEOF(params), 0, rc);
if (rc < 0)
SETUP_DATA_FAIL();
Previous SENDs are apparently successful.
Okay. After an hour of debugging, it turns out it's crashing at random sends, which means that the grandchild process is crashing at the get-go and the sends that succeed just succeed accidentally.
I think I finally found the problem. I am doing the stepping of si into assembly so that I can catch it at the right time.
(lldb) process attach -n mumps -w
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGTRAP
frame #0: 0x00007fff686dc666 libsystem_c.dylib`fork + 18
libsystem_c.dylib`fork:
-> 0x7fff686dc666 <+18>: retq
0x7fff686dc667 <+19>: testl %ebx, %ebx
0x7fff686dc669 <+21>: je 0x7fff686dc67d ; <+41>
0x7fff686dc66b <+23>: cmpl $-0x1, %ebx
Target 0: (mumps) stopped.
Executable module set to "/Users/sam/Documents/repos/YottaDB/build/./mumps".
Architecture set to: x86_64-apple-macosx.
(lldb) si
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff686dc666 libsystem_c.dylib`fork + 18
libsystem_c.dylib`fork:
-> 0x7fff686dc666 <+18>: retq
0x7fff686dc667 <+19>: testl %ebx, %ebx
0x7fff686dc669 <+21>: je 0x7fff686dc67d ; <+41>
0x7fff686dc66b <+23>: cmpl $-0x1, %ebx
Target 0: (mumps) stopped.
(lldb)
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
libyottadb.dylib`intrpt_ok_state:
-> 0x10e43d0a0 <+0>: sbbb %al, (%rax)
0x10e43d0a2 <+2>: addb %al, (%rax)
libyottadb.dylib`mumps_status:
0x10e43d0a4 <+0>: addl %eax, (%rax)
0x10e43d0a6 <+2>: addb %al, (%rax)
Target 0: (mumps) stopped.
(lldb) si
Process 61571 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10e43d0a0)
frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
libyottadb.dylib`intrpt_ok_state:
-> 0x10e43d0a0 <+0>: sbbb %al, (%rax)
0x10e43d0a2 <+2>: addb %al, (%rax)
libyottadb.dylib`mumps_status:
0x10e43d0a4 <+0>: addl %eax, (%rax)
0x10e43d0a6 <+2>: addb %al, (%rax)
Target 0: (mumps) stopped.
More stuff from the same stack. I am puzzled actually by this. None of it makes sense.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x10e43d0a0)
frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
frame #1: 0x000000010e43c1f0 libyottadb.dylib`xfer_name + 2336
* frame #2: 0x000000010db8779e libyottadb.dylib`ojstartchild(jparms=0x00007ffee2adfd00, argcnt=1, non_exit_return=0x00007ffee2adfdbc, pipe_fds=0x00007ffee2adfe68) at ojstartchild.c:389
frame #3: 0x000000010dbb0c17 libyottadb.dylib`op_job(argcnt=1) at op_job.c:190
frame #4: 0x000000010ed361a2
(lldb) f 0
frame #0: 0x000000010e43d0a0 libyottadb.dylib`intrpt_ok_state
libyottadb.dylib`intrpt_ok_state:
-> 0x10e43d0a0 <+0>: sbbb %al, (%rax)
0x10e43d0a2 <+2>: addb %al, (%rax)
libyottadb.dylib`mumps_status:
0x10e43d0a4 <+0>: addl %eax, (%rax)
0x10e43d0a6 <+2>: addb %al, (%rax)
(lldb) p prev_intrpt_state
error: use of undeclared identifier 'prev_intrpt_state'
(lldb) f 1
frame #1: 0x000000010e43c1f0 libyottadb.dylib`xfer_name + 2336
libyottadb.dylib`xfer_table:
0x10e43c1f0 <+0>: addb %dh, %al
0x10e43c1f2 <+2>: pushq %rdi
0x10e43c1f3 <+3>: orl $0x1, %eax
0x10e43c1f8 <+8>: xorb %dh, 0x12(%rbp)
(lldb) p prev_intrpt_state
error: use of undeclared identifier 'prev_intrpt_state'
(lldb) f 2
frame #2: 0x000000010db8779e libyottadb.dylib`ojstartchild(jparms=0x00007ffee2adfd00, argcnt=1, non_exit_return=0x00007ffee2adfdbc, pipe_fds=0x00007ffee2adfe68) at ojstartchild.c:389
386 rts_error_csa(CSA_ARG(NULL) VARLSTCNT(6) ERR_YDBDISTUNVERIF, 4, STRLEN(ydb_dist), ydb_dist,
387 gtmImageNames[image_type].imageNameLen, gtmImageNames[image_type].imageName);
388 FFLUSH(NULL);
-> 389 FORK_RETRY(child_pid);
390 if (child_pid == 0)
391 {
392 /* DEBUG */
(lldb) p prev_intrpt_state
(intrpt_state_t) $7 = INTRPT_OK_TO_INTERRUPT
One last thing, before I go to bed... I have had enough of this...
$rax is 0; $al is 0. So the error happens at dereferencing $rax.
@shabiel : Related to using gdb to debug these multiple process scenarios, the following commands are very useful. Setting them to one of the two possible values listed in each bullet below gives you the flexibility to get gdb to follow the child or the parent after a fork/exec as well as control whether the other one is suspended or detached (executes concurrently). Hope this helps.
For example, the M-Web-Server won't work.
It previously worked on the last port to Darwin, in V6.2-002A.
Confirmed on two different Macs.
I will debug as time permits.