actonlang / acton

The Acton Programming Language
https://www.acton-lang.org/
BSD 3-Clause "New" or "Revised" License
76 stars 7 forks source link

DB client segfault #573

Open plajjan opened 2 years ago

plajjan commented 2 years ago
(lldb)
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007ff809d3f20a libsystem_kernel.dylib`__ulock_wait + 10
    frame #1: 0x00007ff809d7bc8d libsystem_pthread.dylib`_pthread_join + 362
    frame #2: 0x0000000105fb61e4 ddb_test_server`main(argc=11, argv=0x00007ff7b9f868b0) at rts.c:1940:9
    frame #3: 0x000000011004e4fe dyld`start + 462
  thread #2, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d45e4a libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000106018cc1 ddb_test_server`comm_thread_loop(args=0x00007fd267f04290) at client_api.c:208:16
    frame #2: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #3: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #3, stop reason = signal SIGSTOP
    frame #0: 0x0000000105fa1ee4 ddb_test_server`$eventloop(arg=0x0000000000000000) at env.c:1393:43
    frame #1: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #2: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #4, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x0000000105fa66e6 ddb_test_server`main_loop(idx=0x0000000000000001) at rts.c:1335:13
    frame #3: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #5, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x0000000105fa66e6 ddb_test_server`main_loop(idx=0x0000000000000002) at rts.c:1335:13
    frame #3: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #6, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40876 libsystem_kernel.dylib`write + 10
    frame #1: 0x000000010601a1f3 ddb_test_server`send_packet(buf=0x00006000015b87b0, len=38, sockfd=10) at client_api.c:615:13
    frame #2: 0x000000010601a547 ddb_test_server`send_packet_wait_replies_async(out_buf=0x00006000015b87b0, out_len=38, nonce=52209622475688, mc=0x000070000ca75de0, db=0x00007fd267f04290) at client_api.c:704:9
    frame #3: 0x000000010601a5e3 ddb_test_server`send_packet_wait_replies_sync(out_buf=0x00006000015b87b0, out_len=38, nonce=52209622475688, mc=0x000070000ca75de0, db=0x00007fd267f04290) at client_api.c:722:12
    frame #4: 0x000000010601ed2e ddb_test_server`remote_new_txn(db=0x00007fd267f04290) at client_api.c:2051:13
    frame #5: 0x0000000105fa5d8c ddb_test_server`main_loop(idx=0x0000000000000003) at rts.c:1228:42
    frame #6: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #7: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #7, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x000000010601a39f ddb_test_server`wait_on_msg_callback(mc=0x00007fd268a041a0, db=0x00007fd267f04290) at client_api.c:678:8
    frame #3: 0x000000010601a60b ddb_test_server`send_packet_wait_replies_sync(out_buf=0x00006000035b4300, out_len=125, nonce=122578366645442, mc=0x000070000caf8d38, db=0x00007fd267f04290) at client_api.c:728:9
    frame #4: 0x000000010601d45e ddb_test_server`remote_enqueue_in_txn(column_values=0x00006000038b0680, no_cols=1, blob=0x0000000000000000, blob_size=0, table_key=0x0000000000000002, queue_id=0xffffffffffffffe0, txnid=0x00006000015b0450, db=0x00007fd267f04290) at client_api.c:1599:12
    frame #5: 0x0000000105fa39b2 ddb_test_server`FLUSH_outgoing(self=0x00006000035bc000, txnid=0x00006000015b0450) at rts.c:795:23
    frame #6: 0x0000000105fa5dc6 ddb_test_server`main_loop(idx=0x0000000000000004) at rts.c:1231:25
    frame #7: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #8: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
(lldb)

looks like thread 6 and 7 might be stepping on each other?

plajjan commented 2 years ago

Got another one

(lldb) bt all
warning: could not execute support code to read Objective-C class data in the process. This may reduce the quality of type information available.
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007ff809d3f20a libsystem_kernel.dylib`__ulock_wait + 10
    frame #1: 0x00007ff809d7bc8d libsystem_pthread.dylib`_pthread_join + 362
    frame #2: 0x000000010d1cf1e4 ddb_test_server`main(argc=11, argv=0x00007ff7b2d6d8b0) at rts.c:1940:9
    frame #3: 0x0000000117f354fe dyld`start + 462
  thread #2, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d45e4a libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010d231cc1 ddb_test_server`comm_thread_loop(args=0x00007f8f2df04290) at client_api.c:208:16
    frame #2: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #3: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #3, stop reason = signal SIGSTOP
    frame #0: 0x000000010d1baee4 ddb_test_server`$eventloop(arg=0x0000000000000000) at env.c:1393:43
    frame #1: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #2: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #4, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x000000010d23339f ddb_test_server`wait_on_msg_callback(mc=0x00007f8f2df04920, db=0x00007f8f2df04290) at client_api.c:678:8
    frame #3: 0x000000010d23360b ddb_test_server`send_packet_wait_replies_sync(out_buf=0x0000600002e02220, out_len=38, nonce=8242042247969, mc=0x0000700003d9ede0, db=0x00007f8f2df04290) at client_api.c:728:9
    frame #4: 0x000000010d237d2e ddb_test_server`remote_new_txn(db=0x00007f8f2df04290) at client_api.c:2051:13
    frame #5: 0x000000010d1bed8c ddb_test_server`main_loop(idx=0x0000000000000001) at rts.c:1228:42
    frame #6: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #7: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #5, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x000000010d23339f ddb_test_server`wait_on_msg_callback(mc=0x00007f8f2ea042f0, db=0x00007f8f2df04290) at client_api.c:678:8
    frame #3: 0x000000010d23360b ddb_test_server`send_packet_wait_replies_sync(out_buf=0x0000600000300a50, out_len=65, nonce=69028714395173, mc=0x0000700003e21d98, db=0x00007f8f2df04290) at client_api.c:728:9
    frame #4: 0x000000010d236869 ddb_test_server`remote_read_queue_in_txn(consumer_id=0xfffffffffffffff2, shard_id=0x0000000000000000, app_id=0x0000000000000000, table_key=0x0000000000000002, queue_id=0xfffffffffffffff2, max_entries=1, entries_read=0x0000700003e21efc, new_read_head=0x0000700003e21ef0, start_row=0x0000700003e21f08, end_row=0x0000700003e21f00, txnid=0x0000000000000000, db=0x00007f8f2df04290) at client_api.c:1663:12
    frame #5: 0x000000010d1bee7b ddb_test_server`main_loop(idx=0x0000000000000002) at rts.c:1238:36
    frame #6: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #7: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #6, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x000000010d1bf6e6 ddb_test_server`main_loop(idx=0x0000000000000003) at rts.c:1335:13
    frame #3: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
  thread #7, stop reason = signal SIGSTOP
    frame #0: 0x00007ff809d40506 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007ff809d7aa89 libsystem_pthread.dylib`_pthread_cond_wait + 1224
    frame #2: 0x000000010d1bf6e6 ddb_test_server`main_loop(idx=0x0000000000000004) at rts.c:1335:13
    frame #3: 0x00007ff809d7a514 libsystem_pthread.dylib`_pthread_start + 125
    frame #4: 0x00007ff809d7602f libsystem_pthread.dylib`thread_start + 15
(lldb)
plajjan commented 2 years ago

It's really hard to debug based on these printouts. It's also quite tedious to share the core dumps (they're 3.5GB in size) from my laptop. I think the best course forward is to send the mac laptop to @aagapi and he'll do the testing locally. Our goal is to run maaaany test runs with no segmentation faults at all.

I guess we can keep this issue meanwhile to be a sort of placeholder for "run many tests and ensure there are no failures" and act on whatever information we dig out of that.