Rapid memory leak in OTP 25

erlang / otp

Erlang/OTP

http://erlang.org

Apache License 2.0

11.33k stars 2.94k forks source link

Rapid memory leak in OTP 25 #8044

Closed nickva closed 5 months ago

nickva commented 8 months ago

Describe the bug

beam.smp rapidly consumes memory, up to 58GB in 2 minutes, and then gets killed by oom-killer.

memblowout

To Reproduce

So far it seems to happen in production, especially when fetching large (50MB) json documents.

Expected behavior

OOM doesn't happen

Affected versions

OTP 25.3.2.8

Doesn't happen in OTP 24.3.4.13

Additional context

I can make the issue happen more frequently on a cluster by repeatedly fetching a large 50MB json document.

Once the node memory starts increasing there are only a few minutes available to dump the core. I captured ~two~ five core files that way, and a thread backtrace from attaching gdb to the beam.smp:

Process 1 (gdb) thread apply all bt from a core file

Thread 7 (Thread 0x7fab19d41700 (LWP 5668)):
#0  0x00005579b9966b1e in tag_val_def (x=<optimized out>) at beam/erl_term.h:1459
#1  enc_term_int (ctx=ctx@entry=0x7fab19d3cba8, acmp=acmp@entry=0x0, obj=<optimized out>, obj@entry=140337519716642, ep=0x7fa2e7e06968 "h\002b", ep@entry=0x7fa2e7e06709 "h\002d", dflags=dflags@entry=2048, off_heap=off_heap@entry=0x0, reds=<optimized out>, res=<optimized out>) at beam/external.c:3309
#2  0x00005579b996a3f8 in erts_term_to_binary_int (p=p@entry=0x7fa71e9b12a0, bif_ix=bif_ix@entry=98, Term=140337519716642, opts=140337519718417, level=<optimized out>, dflags=<optimized out>, context_b=0x0, iovec=0, fragment_size=<optimized out>) at beam/external.c:2492
#3  0x00005579b996deb5 in term_to_binary_2 (A__p=0x7fa71e9b12a0, BIF__ARGS=<optimized out>, A__I=<optimized out>) at beam/external.c:1486

Thread 5 (Thread 0x7fab19f47700 (LWP 5666)):
#0  erts_deref_node_entry (term=140353844703962, np=0x7fa767433740) at beam/sys.h:1036
#1  erts_cleanup_offheap_list (first=<optimized out>) at beam/erl_message.c:187
#2  erts_cleanup_offheap (offheap=offheap@entry=0x7fa724703660) at beam/erl_message.c:196
#3  0x00005579b97fe32d in delete_process (p=p@entry=0x7fa724703448) at beam/erl_process.c:13193
#4  0x00005579b9818f25 in erts_continue_exit_process (p=0x7fa724703448) at beam/erl_process.c:14500

Process 2 (gdb) thread apply all bt from a core file

Thread 9 (Thread 0x7f18bf6fc700 (LWP 18734)):
#0  0x00005617391af2d4 in lookup (bucket=<synthetic pointer>, key=139744493971986, hash_table=0x7f1665c79be8) at beam/erl_bif_persistent.c:974
#1  persistent_term_get (key=139744493971986) at beam/erl_bif_persistent.c:483
#2  persistent_term_get_2 (A__p=0x7f1665e9d310, BIF__ARGS=0x7f18bf6f7d80, A__I=0x7f18c5759e30) at beam/erl_bif_persistent.c:511

Thread 7 (Thread 0x7f18c4159700 (LWP 18732)):
#0  erts_bin_release (bp=0x7f107de44ff0) at beam/sys.h:1036
#1  sweep_off_heap (p=p@entry=0x7f14bea931e0, fullsweep=fullsweep@entry=0) at beam/erl_gc.c:3026
#2  0x0000561739294e90 in do_minor (p=p@entry=0x7f14bea931e0, live_hf_end=live_hf_end@entry=0xfffffffffffffff8, mature=<optimized out>, mature_size=mature_size@entry=304, new_sz=6772, objv=objv@entry=0x7f18c4154d80, nobj=2) at beam/erl_gc.c:1756
#3  0x0000561739297d1d in minor_collection (recl=<synthetic pointer>, ygen_usage=6743, nobj=2, objv=0x7f18c4154d80, need=5, live_hf_end=<optimized out>, p=0x7f14bea931e0) at beam/erl_gc.c:1441
#4  garbage_collect (p=p@entry=0x7f14bea931e0, live_hf_end=<optimized out>, live_hf_end@entry=0xfffffffffffffff8, need=need@entry=1, objv=objv@entry=0x7f18c4154d80, nobj=nobj@entry=2, fcalls=fcalls@entry=3994, max_young_gen_usage=0) at beam/erl_gc.c:754
#5  0x0000561739298f0c in erts_garbage_collect_nobump (p=0x7f14bea931e0, need=1, objv=0x7f18c4154d80, nobj=2, fcalls=3994) at beam/erl_gc.c:893

Thread 5 (Thread 0x7f18c435f700 (LWP 18730)):
#0  0x00005617391fbcf5 in erts_cleanup_offheap_list (first=<optimized out>) at beam/sys.h:1036
#1  erts_cleanup_offheap (offheap=offheap@entry=0x7f14e1612868) at beam/erl_message.c:196
#2  0x00005617390c032d in delete_process (p=p@entry=0x7f14e1612650) at beam/erl_process.c:13193
#3  0x00005617390daf25 in erts_continue_exit_process (p=0x7f14e1612650) at beam/erl_process.c:14500

Process 3 thread apply all bt from attached process, no core file available

Thread 7 (Thread 0x7f30dbfff700 (LWP 38838)):
#0  0x00007f31298b689a in ?? () from /lib/x86_64-linux-gnu/libz.so.1
#1  0x00007f31298b7552 in inflate () from /lib/x86_64-linux-gnu/libz.so.1
#2  0x0000555e85deac75 in binary2term_uncomp_size (
    data=data@entry=0x7f1df4780046 "...xxx...removeddata..xxx..."..., size=size@entry=21869355) at beam/external.c:1643
#3  0x0000555e85def07f in binary2term_prepare (p=0x7f1e81f99da8, ctxp=<synthetic pointer>, data_size=21869361,
    data=0x7f1df4780040 "....xxx...removeddata...xxx..."...,
    state=0x7f30dbffabe0) at beam/external.c:1682
#4  binary_to_term_int (p=0x7f1e81f99da8, bin=139766723423402, ctx=ctx@entry=0x7f30dbffabd0) at beam/external.c:1896
#5  0x0000555e85df77bf in binary_to_term_1 (A__p=<optimized out>, BIF__ARGS=<optimized out>, A__I=<optimized out>) at beam/external.c:2052

Thread 6 (Thread 0x7f30d3fff700 (LWP 38837)):
#0  move_cons (orig=0x7f1df77a2bb8, hpp=0x7f30d3ffa940, car=139766682010506, ptr=0x7f1df76ec378) at beam/erl_gc.h:47
#1  do_minor (p=p@entry=0x7f2c51a09f30, live_hf_end=live_hf_end@entry=0xfffffffffffffff8, mature=<optimized out>, mature_size=mature_size@entry=121528, new_sz=46422,
    objv=objv@entry=0x7f30d3ffad80, nobj=2) at beam/erl_gc.c:1707
--Type <RET> for more, q to quit, c to continue without paging--
#2  0x0000555e85e5ed1d in minor_collection (recl=<synthetic pointer>, ygen_usage=45513, nobj=2, objv=0x7f30d3ffad80, need=9, live_hf_end=<optimized out>, p=0x7f2c51a09f30)
    at beam/erl_gc.c:1441
#3  garbage_collect (p=p@entry=0x7f2c51a09f30, live_hf_end=<optimized out>, live_hf_end@entry=0xfffffffffffffff8, need=need@entry=5, objv=objv@entry=0x7f30d3ffad80,
    nobj=nobj@entry=2, fcalls=fcalls@entry=2272, max_young_gen_usage=0) at beam/erl_gc.c:754
#4  0x0000555e85e5ff0c in erts_garbage_collect_nobump (p=0x7f2c51a09f30, need=5, objv=0x7f30d3ffad80, nobj=2, fcalls=2272) at beam/erl_gc.c:893

Thread 5 (Thread 0x7f30e0193700 (LWP 38836)):
#0  0x0000555e85dc2ce1 in ethr_native_atomic64_add_return_mb (incr=-1, var=0x555e8733e7c8) at ../include/internal/x86_64/../i386/atomic.h:240
#1  ethr_atomic_add_read (val=-1, var=0x555e8733e7c8) at ../include/internal/ethr_atomics.h:4219
#2  ethr_atomic_dec_read (var=0x555e8733e7c8) at ../include/internal/ethr_atomics.h:4806
#3  erts_refc_dectest (min_val=0, refcp=0x555e8733e7c8) at beam/sys.h:1029
#4  erts_deref_node_entry (term=139827556568434, np=0x555e8733e7b8) at beam/erl_node_tables.h:333
#5  erts_cleanup_offheap_list (first=<optimized out>) at beam/erl_message.c:187
#6  erts_cleanup_offheap (offheap=offheap@entry=0x7f2c301815a0) at beam/erl_message.c:196
#7  0x0000555e85c8732d in delete_process (p=p@entry=0x7f2c30181388) at beam/erl_process.c:13193
#8  0x0000555e85ca1f25 in erts_continue_exit_process (p=0x7f2c30181388) at beam/erl_process.c:14500

The common theme seems to be something related to erts_cleanup_offheap when called from erts_continue_exit_process?

Digging into one of the erts_cleanup_offheap call a bit with gdb noticed there is a cycle (!) there instead of a linked list.

(gdb) thread apply all bt

Thread 8 (Thread 0x7fd2322fe700 (LWP 71524)):
#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:241
#1  0x000055fd3aec2f32 in sys_memcpy (n=41, src=<optimized out>, dest=<optimized out>) at beam/sys.h:1093
#2  dec_term (edep=edep@entry=0x7fd2322f5c60, factory=<optimized out>, factory@entry=0x0, ep=<optimized out>, ep@entry=0x0, objp=0x7fcaaab7e8c8, objp@entry=0x0, ctx=ctx@entry=0x7fcac3bac8c8, ets_decode=ets_decode@entry=0) at beam/external.c:4516
#3  0x000055fd3aec3a8d in binary_to_term_int (p=0x7fcdcace64b0, p@entry=<error reading variable: value has been optimized out>, bin=0, bin@entry=<error reading variable: value has been optimized out>, ctx=0x7fcac3bac8c8, ctx@entry=<error reading variable: value has been optimized out>) at beam/external.c:1961
#4  0x000055fd3ad8579d in beam_jit_call_bif (c_p=0x7fcdcace64b0, reg=0x7fd2322f9d80, I=<optimized out>, vbf=<optimized out>, arity=1) at beam/jit/beam_jit_common.cpp:515
#5  0x00007fd23326f338 in ?? ()
#6  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7fd232504700 (LWP 71522)):
#0  handle_aux_work (awdp=0x7fd235df3938, orig_aux_work=<optimized out>, waiting=0) at beam/erl_process.c:2687
#1  0x000055fd3ad6f8d9 in erts_schedule (esdp=0x7fd235df3840, p=<optimized out>, calls=<optimized out>) at beam/erl_process.c:9726
#2  0x00007fd2332705ed in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fd232607700 (LWP 71521)):
#0  0x000055fd3ae97d06 in erts_cleanup_offheap_list (first=<optimized out>) at beam/erl_message.c:169
#1  erts_cleanup_offheap (offheap=offheap@entry=0x7fce173f21b0) at beam/erl_message.c:196
#2  0x000055fd3ad5c32d in delete_process (p=p@entry=0x7fce173f1f98) at beam/erl_process.c:13193
#3  0x000055fd3ad76f25 in erts_continue_exit_process (p=0x7fce173f1f98) at beam/erl_process.c:14500
#4  0x00007fd233270cac in ?? ()
#5  0x0000000000000000 in ?? ()

(gdb) thread 5

#3  0x000055fd3ad76f25 in erts_continue_exit_process (p=0x7fce173f1f98) at beam/erl_process.c:14500
14500   in beam/erl_process.c
(gdb) etp-process-info-x p
  Pid: <0.24136.267>
  State: running | exiting | free | prq-prio-normal | usr-prio-normal | act-prio-normal

  Flags:
  Current function: unknown
  I: #Cp<0x7fd233270c88>
  Heap size: 6772
  Old-heap size: 6772
  Mbuf size: 7
  Msgq len: 1 (inner=1, outer=0)
  Parent: <0.23477.272>
  Pointer: (Process*)0x7fce173f1f98
  Msgq Flags: on-heap
  --- Inner signal queue (message queue) ---
    [{#Ref<0.1926718838.4073193475.96299>,done} @from= <0.23477.272>, % <== SAVE
     {'$gen_call',{<0.31091.270>,#Ref<0.1926718838.4073193475.198290>},{request,<0.16281.259>,{pread_iolist,2775244701},db_compact,nil,undefined,undefined,-576457917221413335,undefined}} @from= <0.31091.270>,
     {'$gen_call',{<0.252.0>,#Ref<0.1926718838.4073193475.198292>},{pread_iolist,2154837901}} @from= <0.252.0>,
     {#Ref<0.1926718838.4073193475.198290>,
.....]

    Message signals: 40
    Non-message signals: 0

  --- Middle signal queue ---
    []

    Message signals: 0
    Non-message signals: 0

  --- Outer queue ---
    []

    Message signals: 0
    Non-message signals: 0

Cannot access memory at address 0xfed00018

(gdb) thread 5

(gdb) up
#1  erts_cleanup_offheap (offheap=offheap@entry=0x7fce173f21b0) at beam/erl_message.c:196
196 in beam/erl_message.c

(gdb) print *(offheap->first)
$12 = {thing_word = 312, size = 94546113895544, next = 0x7fcd8c835d98}
(gdb) print *(offheap->first->next)
$13 = {thing_word = 240, size = 94546113895544, next = 0x7fcd8c835e90}
(gdb) print *(offheap->first->next->next)
$14 = {thing_word = 312, size = 94546114517328, next = 0x7fcd8c835e70}
(gdb) print *(offheap->first->next->next->next)
$15 = {thing_word = 240, size = 94546114517328, next = 0x7fcd8c835e48}
(gdb) print *(offheap->first->next->next->next->next)
$16 = {thing_word = 312, size = 140535760416136, next = 0x7fcd8c835e28}
(gdb) print *(offheap->first->next->next->next->next->next)
$17 = {thing_word = 240, size = 140535760416136, next = 0x7fcd8c835e00}
(gdb) print *(offheap->first->next->next->next->next->next->next)
$18 = {thing_word = 312, size = 94546114518368, next = 0x7fcd8c835de0}
(gdb) print *(offheap->first->next->next->next->next->next->next->next)
$19 = {thing_word = 240, size = 94546114518368, next = 0x7fcd8c835db8}
(gdb) print *(offheap->first->next->next->next->next->next->next->next->next)
$20 = {thing_word = 312, size = 94546113895544, next = 0x7fcd8c835d98}
(gdb) print *(offheap->first->next->next->next->next->next->next->next->next->next)
$21 = {thing_word = 240, size = 94546113895544, next = 0x7fcd8c835e90}
(gdb) print *(offheap->first->next->next->next->next->next->next->next->next->next->next)
$22 = {thing_word = 312, size = 94546114517328, next = 0x7fcd8c835e70}

$12 next pointer 0x7fcd8c835d98 starts repeating at $20

nickva commented 8 months ago

Just a wild guess, but since we see this in OTP 25 but not in OTP 24 wonder if it might be related to https://github.com/erlang/otp/pull/5195 (Further optimize off-heap traversal during minor GC)?

rnewson commented 8 months ago

Just a bump to highlight that Nick updated the first comment with a trace showing a corrupted linked list, an entry whose next pointer points at an earlier item in the same linked list, and thus forms an infinite loop.

nickva commented 8 months ago

Not sure if's helpful or not but another core dump has a similar cycle just a bit smaller, with only two items:

(gdb) bt
#0  erts_deref_node_entry (term=139983200562610, np=0x7f511fb78c00) at beam/sys.h:1036
#1  erts_cleanup_offheap_list (first=<optimized out>) at beam/erl_message.c:187
#2  erts_cleanup_offheap (offheap=offheap@entry=0x7f508fc7d9f0) at beam/erl_message.c:196
#3  0x000055f9623c332d in delete_process (p=p@entry=0x7f508fc7d7d8) at beam/erl_process.c:13193
#4  0x000055f9623ddf25 in erts_continue_exit_process (p=0x7f508fc7d7d8) at beam/erl_process.c:14500

(gdb) print *offheap
$77 = {first = 0x7f5060f1ae88, overhead = 0}
(gdb) print *offheap->first
$78 = {thing_word = 312, size = 94529605763992, next = 0x7f5060f1ae68}
(gdb) print *offheap->first->next
$79 = {thing_word = 240, size = 94529605763992, next = 0x7f5060f1ae40}
(gdb) print *offheap->first->next->next
$80 = {thing_word = 312, size = 94529605469800, next = 0x7f5060f1ae20}
(gdb) print *offheap->first->next->next->next
$81 = {thing_word = 240, size = 94529605469800, next = 0x7f5060f1adf8}
(gdb) print *offheap->first->next->next->next->next
$82 = {thing_word = 312, size = 94529606018432, next = 0x7f5060f1add8}
(gdb) print *offheap->first->next->next->next->next->next
$83 = {thing_word = 240, size = 94529606018432, next = 0x7f5060f1adb0}
(gdb) print *offheap->first->next->next->next->next->next->next
$84 = {thing_word = 312, size = 139986401201152, next = 0x7f5060f1ad90}
(gdb) print *offheap->first->next->next->next->next->next->next->next
$85 = {thing_word = 240, size = 139986401201152, next = 0x7f5060f1adb0}
(gdb) print *offheap->first->next->next->next->next->next->next->next->next
$86 = {thing_word = 312, size = 139986401201152, next = 0x7f5060f1ad90}
(gdb) print *offheap->first->next->next->next->next->next->next->next->next->next
$87 = {thing_word = 240, size = 139986401201152, next = 0x7f5060f1adb0}
(gdb) print *offheap->first->next->next->next->next->next->next->next->next->next->next
$88 = {thing_word = 312, size = 139986401201152, next = 0x7f5060f1ad90}

Dumping the offheap terms in the list I found these:

(gdb) etp-offheapdump offheap.first

% Offheap dump:
[#Ref<19779:redacted@db9.redactedhost.com/1706073945.731125376.3375104002.100048>,
 <19779:redacted@db9.redactedhost.com/1706073945.14014.1778>,
 #Ref<19781:redacted@db8.redactedhost.com/1704807922.2345371943.3878420481.204846>,
 <19781:redacted@db8.redactedhost.com/1704807922.2701.5018>,
 #Ref<19782:redacted@db11.redactedhost.com/1706072932.934599531.1688731651.75609>,
 <19782:redacted@db11.redactedhost.com/1706072932.8336.1336>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>,
 #Ref<19780:redacted@db7.redactedhost.com/1705590953.2343983787.450101250.188743>,
 <19780:redacted@db7.redactedhost.com/1705590953.11607.7439>].

(gdb)  etpf-offheapdump offheap.first
warning: Expression is not an assignment (and might have no effect)
% Offheap dump:
[<etpf-boxed 0x7f5060f1ae88>,
 <etpf-boxed 0x7f5060f1ae68>,
 <etpf-boxed 0x7f5060f1ae40>,
 <etpf-boxed 0x7f5060f1ae20>,
 <etpf-boxed 0x7f5060f1adf8>,
 <etpf-boxed 0x7f5060f1add8>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>,
 <etpf-boxed 0x7f5060f1adb0>,
 <etpf-boxed 0x7f5060f1ad90>].

nickva commented 8 months ago

Another interesting thing I noticed is that all the processes which attempt to clean the off-heap list just one particular process in Apache CouchDB: https://github.com/apache/couchdb/blob/main/src/fabric/src/fabric_db_update_listener.erl#L39.

(gdb) etp-stacktrace *(Process*)(0x7fce173f1f98)
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (24)
I: #Cp<0x7fd233270c88>.
6: #Cp<rexi_utils:process_mailbox/6+0xa0> @ <etpf-cons 0x7fd23b1f07f0>:55.
12: #Cp<fabric_db_update_listener:receive_results/3+0xf4> @ <etpf-cons 0x7fd23b1df748>:146.
16: #Cp<fabric_db_update_listener:go/5+0x208> @ <etpf-cons 0x7fd23b1df748>:56.
23: #Cp<terminate process normally>.

All the other process backtraces also point to this process:

(gdb) etp-stacktrace *(Process*)(0x7fa724703448)
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (24)
I: #Cp<0x7fab1abb0c88>.
6: #Cp<rexi_utils:process_mailbox/6+0xa0> @ "src/rexi_utils.erl":55.
12: #Cp<fabric_db_update_listener:receive_results/3+0xf4> @ "src/fabric_db_update_listener.erl":146.
16: #Cp<fabric_db_update_listener:go/5+0x208> @ "src/fabric_db_update_listener.erl":56.
23: #Cp<terminate process normally>.

(gdb) etp-stacktrace *(Process*)0x7f14e1612650
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (24)
I: #Cp<0x7f18c4fc8c88>.
6: #Cp<rexi_utils:process_mailbox/6+0xa0> @ "src/rexi_utils.erl":55.
12: #Cp<fabric_db_update_listener:receive_results/3+0xf4> @ "src/fabric_db_update_listener.erl":146.
16: #Cp<fabric_db_update_listener:go/5+0x208> @ "src/fabric_db_update_listener.erl":56.
23: #Cp<terminate process normally>.

(gdb) etp-stacktrace *(Process*)0x7fa724703448
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (24)
I: #Cp<0x7fab1abb0c88>.
6: #Cp<rexi_utils:process_mailbox/6+0xa0> @ "src/rexi_utils.erl":55.
12: #Cp<fabric_db_update_listener:receive_results/3+0xf4> @ "src/fabric_db_update_listener.erl":146.
16: #Cp<fabric_db_update_listener:go/5+0x208> @ "src/fabric_db_update_listener.erl":56.
23: #Cp<terminate process normally>.

(gdb) etp-stacktrace *(Process*)0x7f508fc7d7d8
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (24)
I: #Cp<0x7f55026bcc88>.
6: #Cp<rexi_utils:process_mailbox/6+0xa0> @ "src/rexi_utils.erl":55.
12: #Cp<fabric_db_update_listener:receive_results/3+0xf4> @ "src/fabric_db_update_listener.erl":146.
16: #Cp<fabric_db_update_listener:go/5+0x208> @ "src/fabric_db_update_listener.erl":56.
23: #Cp<terminate process normally>.

(gdb) etp-stacktrace *(Process*)0x55af8cc75c78
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (3)
I: #Cp<0x7ff30907cc88>.
0: #Cp<fabric_db_update_listener:go/5+0x23c> @ "src/fabric_db_update_listener.erl":58.
2: #Cp

On the surface there isn't anything special about this process. It maybe a very short lived process or it may run for long time. It does send messages across the dist channel and there is possibly a race condition where it's sent a message to exit while the main request process it's linked as well so may also crash.

So far I haven't been able to reproduce it on my laptop (intel, mac).

nickva commented 8 months ago

Some debugging info about the schdulers

(gdb) etp-schedulers
--- Scheduler 1 ---
 IX: 0
 CPU Binding: unbound
 Aux work Flags: delayed-aw-wakeup delayed-dealloc delayed-dealloc-thr-prgr later-op canceled-timers misc
 Sleep Info Flags:
 Pointer: (ErtsSchedulerData*)0x7f5505235540
 - Run Queue -
  Length: total=0, max=0, high=0, normal=0, low=0, port=0
  Misc Jobs: no
  Misc Flags: non-empty exec
  Pointer: (ErtsRunQueue*)0x7f5505211740
--- Scheduler 2 ---
 IX: 1
 CPU Binding: unbound
 Aux work Flags: delayed-dealloc-thr-prgr later-op canceled-timers-thr-prgr misc-thr-prgr
 Sleep Info Flags:
 Pointer: (ErtsSchedulerData*)0x7f550523f840
 - Run Queue -
  Length: total=3, max=0, high=0, normal=3, low=0, port=0
  Misc Jobs: no
  Queue Mask: normal
  Misc Flags: out-of-work halftime-out-of-work non-empty exec
  Pointer: (ErtsRunQueue*)0x7f5505211900
--- Scheduler 3 ---
 IX: 2
 CPU Binding: unbound
 Aux work Flags: delayed-dealloc-thr-prgr later-op canceled-timers-thr-prgr misc-thr-prgr
 Sleep Info Flags: sleeping tse waiting
 Pointer: (ErtsSchedulerData*)0x7f5505249b40
 - Run Queue -
  Length: total=0, max=0, high=0, normal=0, low=0, port=0
  Misc Jobs: no
  Misc Flags: out-of-work halftime-out-of-work
  Pointer: (ErtsRunQueue*)0x7f5505211ac0

sverker commented 8 months ago

First of all could you show what loaded NIFs or linked in drivers you have. For example in your core dump do

(gdb) source erts/etc/unix/etp-commands
:
(gdb) p *((struct tainted_module_t*)first_taint.counter)
$5 = {next = 0x7fcfef2a52e8, module_atom = 433035}
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next
$6 = {next = 0x0, module_atom = 612427}
:
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next
Cannot access memory at address 0x0

and then print the module_atom's with

(gdb) etp 433035
crypto
(gdb) etp 612427
asn1rt_nif
:

nickva commented 8 months ago

Thanks for taking a look, @sverker.

Here are the nifs/drivers we have:

(gdb) etp 225227
exxhash.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next
$2 = {next = 0x7fab2175c8d8, module_atom = 254923}
(gdb) etp  254923
ucol.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next
$3 = {next = 0x7fa766986078, module_atom = 230475}
(gdb) etp 230475
couch_ejson_compare.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next
$4 = {next = 0x7fab1f6a3240, module_atom = 203147}
(gdb) etp 203147
asn1rt_nif.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next
$5 = {next = 0x7fab1f5101c0, module_atom = 247947}
(gdb) etp 247947
jiffy.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next->next
$6 = {next = 0x7fab21b2f8b0, module_atom = 221643}
(gdb) etp 221643
b64url.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next->next->next
$7 = {next = 0x7fa76692dad0, module_atom = 254859}
(gdb) etp 254859
snappy.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next->next->next->next
$8 = {next = 0x7fa76691a168, module_atom = 226315}
(gdb) etp 226315
hqueue.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next->next->next->next->next
$9 = {next = 0x7fa7677edaf0, module_atom = 224075}
(gdb) etp 224075
khash.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next->next->next->next->next->next
$10 = {next = 0x0, module_atom = 206731}
(gdb) etp 206731
crypto.
(gdb) p *((struct tainted_module_t*)first_taint.counter)->next->next->next->next->next->next->next->next->next->next
Cannot access memory at address 0x0

nickva commented 8 months ago

I am also now able to reproduce the issue on a smaller, isolated test cluster. I compiled a debug emulator and ran it with the load that usually reproduces the issue. The rapid memory leak started however then it didn't blow up to 60GB and haven't gotten to the off_heap cycle there. Instead memory (rss) rose to about 40GB and then the emulator stopped accepting connections. I managed to remsh in and inspect the running processes. Most were stuck in erlang,bif_return_trap,2 and some in erts_internal,dsend_continue_trap.

s:show_current_call_counts().
[{39085,{erlang,bif_return_trap,2}},
 {30748,{erts_internal,dsend_continue_trap,1}},
 {736,{gen_server,loop,7}},
 ...

I took a core dump and the threads stack look like this (skipping the waiting or yielding ones, and the initial ? stack start entries):

(gdb) thread apply all bt

Thread 54 (Thread 0x7fee29c75700 (LWP 3888440)):
#0  0x000055b5fb9a0ecf in erts_mbuf_size (p=0x7febaf163660) at beam/erl_message.c:1663
#1  0x000055b5fb8526c4 in debug_call_light_bif (c_p=0x7febaf163660, reg=0x7fee29c70d00, I=0x7febacd6f914, vbf=0x55b5fb978b54 <setelement_3>) at beam/jit/x86/instr_bif.cpp:361
#2  0x00007fee309e745a in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 50 (Thread 0x7fee29e81700 (LWP 3888436)):
#0  0x00007fee783334e7 in sched_yield () at ../sysdeps/unix/syscall-template.S:120
#1  0x000055b5fb94760e in erts_thr_yield () at beam/erl_threads.h:2540
#2  0x000055b5fb7f91e2 in sched_spin_wait (ssi=0x7fee33144080, spincount=10000) at beam/erl_process.c:3063
#3  0x000055b5fb7fa3b8 in scheduler_wait (fcalls=0x7fee29e7cb60, esdp=0x7fee33326500, rq=0x7fee3313b980) at beam/erl_process.c:3559
#4  0x000055b5fb80ab79 in erts_schedule (esdp=0x7fee33326500, p=0x7fe64844a908, calls=11) at beam/erl_process.c:9795
#5  0x000055b5fb83613b in erts_debug_schedule (esdp=0x0, c_p=0x7fe64844a908, calls=11) at beam/jit/x86/process_main.cpp:43

Thread 47 (Thread 0x7fee2a00a700 (LWP 3888433)):
#0  erts_lc_require_lock_flg (lck=0x7fee2a005c10, options=0, file=0x55b5fbc1ab20 "beam/jit/x86/instr_bif.cpp", line=371) at beam/erl_lock_check.c:1110
#1  0x000055b5fba601af in erts_lc_require_lock (lck=0x7fee2a005c10, file=0x55b5fbc1ab20 "beam/jit/x86/instr_bif.cpp", line=371) at beam/erl_lock_check.c:1316
#2  0x000055b5fb9bf13c in erts_proc_lc_require_lock (p=0x7fe7c0e571b8, locks=1, file=0x55b5fbc1ab20 "beam/jit/x86/instr_bif.cpp", line=371) at beam/erl_process_lock.c:1320
#3  0x000055b5fb85277d in debug_call_light_bif (c_p=0x7fe7c0e571b8, reg=0x7fee2a005d00, I=0x7febacf0e22c, vbf=<optimized out>) at beam/jit/x86/instr_bif.cpp:371

Thread 45 (Thread 0x7fee2a110700 (LWP 3888431)):
#0  ethr_native_atomic32_read (var=0x55b5fbd651c8 <no_empty_run_queues>) at ../include/internal/x86_64/../i386/atomic.h:178
#1  0x000055b5fbbc55f3 in ethr_atomic32_read_acqb__ (var=0x55b5fbd651c8 <no_empty_run_queues>) at ../include/internal/ethr_atomics.h:7790
#2  0x000055b5fbbcb296 in ethr_atomic32_read_acqb (var=0x55b5fbd651c8 <no_empty_run_queues>) at common/ethr_atomics.c:3319
#3  0x000055b5fb7fcb8b in try_steal_task (rq=0x7fee3313af80) at beam/erl_process.c:4591
#4  0x000055b5fb80aadf in erts_schedule (esdp=0x7fee332f34c0, p=0x7fe8f3bd19b8, calls=11) at beam/erl_process.c:9767
#5  0x000055b5fb83613b in erts_debug_schedule (esdp=0x0, c_p=0x7fe8f3bd19b8, calls=11) at beam/jit/x86/process_main.cpp:43

Thread 44 (Thread 0x7fee2a193700 (LWP 3888430)):
#0  0x000055b5fb9a0ef5 in erts_mbuf_size (p=0x7feba40a0338) at beam/erl_message.c:1664
#1  0x000055b5fb8526c4 in debug_call_light_bif (c_p=0x7feba40a0338, reg=0x7fee2a18ed00, I=0x7fee30a90cd8, vbf=0x55b5fb929068 <erts_internal_counters_add_3>) at beam/jit/x86/instr_bif.cpp:361

Thread 43 (Thread 0x7fee2a216700 (LWP 3888429)):
#0  0x00007fee78768f09 in __pthread_mutex_unlock_usercnt (mutex=0x7fee33139588, decr=<optimized out>) at pthread_mutex_unlock.c:58
#1  0x000055b5fbbcc7d1 in ethr_mutex_unlock__ (mtx=0x7fee33139588) at ../include/internal/ethr_mutex.h:663
#2  0x000055b5fbbce0d7 in ethr_mutex_unlock (mtx=0x7fee33139588) at common/ethr_mutex.c:1794
#3  0x000055b5fb94656d in erts_mtx_unlock (mtx=0x7fee33139588) at beam/erl_threads.h:1772
#4  0x000055b5fb94cb5c in erts_runq_unlock (rq=0x7fee33139580) at beam/erl_process.h:2700
#5  0x000055b5fb7fc7a2 in try_steal_task_from_victim (rq=0x7fee3313ab80, rq_lockedp=0x7fee2a211b08, vrq=0x7fee33139580, flags=170917892) at beam/erl_process.c:4496
#6  0x000055b5fb7fc9c0 in check_possible_steal_victim (rq=0x7fee3313ab80, rq_lockedp=0x7fee2a211b08, vix=27) at beam/erl_process.c:4543
#7  0x000055b5fb7fcb76 in try_steal_task (rq=0x7fee3313ab80) at beam/erl_process.c:4598
#8  0x000055b5fb80aadf in erts_schedule (esdp=0x7fee332dee40, p=0x7fe95eac5430, calls=11) at beam/erl_process.c:9767
#9  0x000055b5fb83613b in erts_debug_schedule (esdp=0x0, c_p=0x7fe95eac5430, calls=11) at beam/jit/x86/process_main.cpp:43

Thread 40 (Thread 0x7fee2a39f700 (LWP 3888426)):
#0  erts_proc_sig_send_demonitor (sender=0x7fe628ddd980, from=140633274307680, system=21941, mon=0x7fee2a39a950) at beam/erl_proc_sig_queue.c:2629
#1  0x000055b5fb96fd58 in demonitor (c_p=0x7fe628ddd938, ref=140603480013186, multip=0x7fee2a39abf8) at beam/bif.c:411
#2  0x000055b5fb97022c in demonitor_2 (A__p=0x7fe628ddd938, BIF__ARGS=0x7fee2a39ad00, A__I=0x7fee30ab47bc) at beam/bif.c:538
#3  0x000055b5fb8526f7 in debug_call_light_bif (c_p=0x7fe628ddd938, reg=0x7fee2a39ad00, I=0x7fee30ab47bc, vbf=0x55b5fb970156 <demonitor_2>) at beam/jit/x86/instr_bif.cpp:363

Thread 36 (Thread 0x7fee2a5ab700 (LWP 3888422)):
#0  0x000055b5fb9a0f00 in erts_mbuf_size (p=0x7febaf1607c0) at beam/erl_message.c:1663
#1  0x000055b5fb852719 in debug_call_light_bif (c_p=0x7febaf1607c0, reg=0x7fee2a5a6d00, I=0x7febacd53804, vbf=<optimized out>) at beam/jit/x86/instr_bif.cpp:365

Thread 35 (Thread 0x7fee2a62e700 (LWP 3888421)):
#0  erts_lc_unlock_flg (lck=0x55b5fbda37e0 <time_sup+288>, options=2) at beam/erl_lock_check.c:1227
#1  0x000055b5fb9469ca in erts_rwmtx_runlock (rwmtx=0x55b5fbda3780 <time_sup+192>) at beam/erl_threads.h:2004
#2  0x000055b5fb9cbf4b in read_corrected_time (os_drift_corrected=1) at beam/erl_time_sup.c:313
#3  0x000055b5fb9cbf75 in get_os_drift_corrected_time () at beam/erl_time_sup.c:321
#4  0x000055b5fb9cf06a in erts_get_monotonic_time (esdp=0x7fee3328d440) at beam/erl_time_sup.c:1787
#5  0x000055b5fba9b304 in parse_timeout_pos (esdp=0x7fee3328d440, arg=175, conv_arg=0x7fee2a629c28, abs=0, tposp=0x7fee2a629c20, stimep=0x7fee2a629c1c, msp=0x0) at beam/erl_hl_timer.c:2384
#6  0x000055b5fba9be0b in erts_set_proc_timer_term (c_p=0x7fe64870cc18, etmo=175) at beam/erl_hl_timer.c:2664
#7  0x000055b5fb82fb90 in beam_jit_wait_timeout (c_p=0x7fe64870cc18, timeout_value=<optimized out>, next=0x7febaceac114) at beam/jit/beam_jit_common.cpp:1173

Thread 34 (Thread 0x7fee2a6b1700 (LWP 3888420)):
#0  ethr_atomic32_read_acqb (var=0x7fee3313e278) at common/ethr_atomics.c:3325
#1  0x000055b5fb7fc984 in check_possible_steal_victim (rq=0x7fee33139980, rq_lockedp=0x7fee2a6acb08, vix=65) at beam/erl_process.c:4541
#2  0x000055b5fb7fcafa in try_steal_task (rq=0x7fee33139980) at beam/erl_process.c:4577
#3  0x000055b5fb80aadf in erts_schedule (esdp=0x7fee33283100, p=0x7fe63332dd80, calls=11) at beam/erl_process.c:9767
#4  0x000055b5fb83613b in erts_debug_schedule (esdp=0x0, c_p=0x7fe63332dd80, calls=11) at beam/jit/x86/process_main.cpp:43

Thread 33 (Thread 0x7fee2a734700 (LWP 3888419)):
#0  new_locked_lock (thr=0x7feda8002b70, lck=0x7fee331397b8, options=0, file=0x55b5fbc70e52 "beam/erl_process.h", line=2688) at beam/erl_lock_check.c:416
#1  0x000055b5fba5fb2b in erts_lc_lock_flg_x (lck=0x7fee331397b8, options=0, file=0x55b5fbc70e52 "beam/erl_process.h", line=2688) at beam/erl_lock_check.c:1188
#2  0x000055b5fba60141 in erts_lc_lock_x (lck=0x7fee331397b8, file=0x55b5fbc70e52 "beam/erl_process.h", line=2688) at beam/erl_lock_check.c:1300
#3  0x000055b5fb946536 in erts_mtx_lock_x (mtx=0x7fee33139788, file=0x55b5fbc70e52 "beam/erl_process.h", line=2688) at beam/erl_threads.h:1743
#4  0x000055b5fb94cb13 in erts_runq_lock (rq=0x7fee33139780) at beam/erl_process.h:2688
#5  0x000055b5fb80a91c in erts_schedule (esdp=0x7fee33278dc0, p=0x7fe61a049460, calls=11) at beam/erl_process.c:9727
#6  0x000055b5fb83613b in erts_debug_schedule (esdp=0x0, c_p=0x7fe61a049460, calls=11) at beam/jit/x86/process_main.cpp:43

Thread 32 (Thread 0x7fee2a7b7700 (LWP 3888418)):
#0  erts_tsd_get (key=3) at beam/erl_threads.h:2424
#1  0x000055b5fba5dc3d in get_my_locked_locks () at beam/erl_lock_check.c:391
#2  0x000055b5fba5f139 in erts_lc_check_exact (have=0x7fee2a7b2b80, have_len=1) at beam/erl_lock_check.c:928
#3  0x000055b5fb9bf6bb in erts_proc_lc_chk_only_proc (p=0x7fe9550ba1c8, locks=1) at beam/erl_process_lock.c:1476
#4  0x000055b5fb9bf3bd in erts_proc_lc_chk_only_proc_main (p=0x7fe9550ba1c8) at beam/erl_process_lock.c:1425
#5  0x000055b5fb8360f8 in erts_debug_schedule (esdp=0x0, c_p=0x7fe9550ba1c8, calls=11) at beam/jit/x86/process_main.cpp:40

Thread 29 (Thread 0x7fee2a940700 (LWP 3888415)):
#0  0x000055b5fba976e6 in create_tw_timer (esdp=0x7fee332500c0, timeout_pos=3689744, short_time=1, type=ERTS_TMR_PROC, rcvrp=0x7fe63dc7df38, rcvr=2693133549904579, msg=24, refn=0x0, callback=0x0, arg=0x0) at beam/erl_hl_timer.c:892
#1  0x000055b5fba9bd68 in set_proc_timer_common (c_p=0x7fe63dc7df38, esdp=0x7fee332500c0, tmo=10, timeout_pos=3689744, short_time=1) at beam/erl_hl_timer.c:2646
#2  0x000055b5fba9be4f in erts_set_proc_timer_term (c_p=0x7fe63dc7df38, etmo=175) at beam/erl_hl_timer.c:2672
#3  0x000055b5fb82fb90 in beam_jit_wait_timeout (c_p=0x7fe63dc7df38, timeout_value=<optimized out>, next=0x7febaceac114) at beam/jit/beam_jit_common.cpp:1173

Thread 28 (Thread 0x7fee2a9c3700 (LWP 3888414)):
#0  0x000055b5fba6000b in erts_lc_unlock_flg (lck=0x7fee2a9be9a0, options=0) at beam/erl_lock_check.c:1259
#1  0x000055b5fba60161 in erts_lc_unlock (lck=0x7fee2a9be9a0) at beam/erl_lock_check.c:1306
#2  0x000055b5fb9befd8 in erts_proc_lc_unlock (p=0x7fe563802848, locks=2) at beam/erl_process_lock.c:1259
#3  0x000055b5fb94d564 in erts_proc_unlock__ (p=0x7fe563802848, pix_lck=0x0, locks=2) at beam/erl_process_lock.h:750
#4  0x000055b5fb94d89f in erts_proc_unlock (p=0x7fe563802848, locks=2) at beam/erl_process_lock.h:967
#5  0x000055b5fba970d0 in proc_timeout_common (proc=0x7fe563802848, tmr=0x7fe2020eee70) at beam/erl_hl_timer.c:679
#6  0x000055b5fba97498 in tw_proc_timeout (vtwtp=0x7fe2020eee70) at beam/erl_hl_timer.c:816
#7  0x000055b5fb9ca452 in timeout_timer (p=0x7fe2020eee88) at beam/time.c:762
#8  0x000055b5fb9cad05 in erts_bump_timers (tiw=0x7fee2ba9e800, curr_time=3689733098827) at beam/time.c:962
#9  0x000055b5fb80a650 in erts_schedule (esdp=0x7fee33245d80, p=0x7fe9d2605c40, calls=11) at beam/erl_process.c:9669
#10 0x000055b5fb83613b in erts_debug_schedule (esdp=0x0, c_p=0x7fe9d2605c40, calls=11) at beam/jit/x86/process_main.cpp:43

Thread 27 (Thread 0x7fee2aa46700 (LWP 3888413)):
#0  0x000055b5fb8d108e in fix_cpool_alloc (allctr=0x7fee33d84ec0, type=25944, size=96) at beam/erl_alloc_util.c:1385
#1  0x000055b5fb8e6b95 in do_erts_alcu_alloc (type=25944, allctr=0x7fee33d84ec0, size=96) at beam/erl_alloc_util.c:5980
#2  0x000055b5fb8e7088 in erts_alcu_alloc_thr_pref (type=25944, extra=0x55b5fbd64aa0 <erts_allctr_thr_spec+192>, size=96) at beam/erl_alloc_util.c:6100
#3  0x000055b5fb8cfd11 in debug_alloc (type=25944, extra=0x55b5fbd722a0 <real_allctrs+256>, size=72) at beam/erl_alloc.c:4036
#4  0x000055b5fb8c3ee7 in erts_alloc (type=25944, size=72) at beam/erl_alloc.h:245
#5  0x000055b5fba90879 in tw_timer_alloc () at beam/erl_hl_timer.c:228
#6  0x000055b5fba975ef in create_tw_timer (esdp=0x7fee3323ba40, timeout_pos=3689744, short_time=1, type=ERTS_TMR_PROC, rcvrp=0x7fe8f0a2dfc8, rcvr=3500127783804595, msg=24, refn=0x0, callback=0x0, arg=0x0) at beam/erl_hl_timer.c:864
#7  0x000055b5fba9bd68 in set_proc_timer_common (c_p=0x7fe8f0a2dfc8, esdp=0x7fee3323ba40, tmo=10, timeout_pos=3689744, short_time=1) at beam/erl_hl_timer.c:2646
#8  0x000055b5fba9be4f in erts_set_proc_timer_term (c_p=0x7fe8f0a2dfc8, etmo=175) at beam/erl_hl_timer.c:2672
#9  0x000055b5fb82fb90 in beam_jit_wait_timeout (c_p=0x7fe8f0a2dfc8, timeout_value=<optimized out>, next=0x7febaceac114) at beam/jit/beam_jit_common.cpp:1173

Thread 18 (Thread 0x7fee2aee1700 (LWP 3888404)):
#0  0x000055b5fb954ba2 in erts_msgq_set_save_next (c_p=0x7febae6a55d0) at beam/erl_proc_sig_queue.h:2232

Thread 9 (Thread 0x7fee2b37c700 (LWP 3888395)):
#0  0x000055b5fba601d1 in erts_lc_unrequire_lock (lck=0x7fee2b377c10) at beam/erl_lock_check.c:1322
#1  0x000055b5fb9bf2c3 in erts_proc_lc_unrequire_lock (p=0x7fea0ba58400, locks=1) at beam/erl_process_lock.c:1377
#2  0x000055b5fb87ab9f in erts_lc_proc_sig_receive_helper (c_p=0x7fea0ba58400, fcalls=3993, neg_o_reds=0, msgpp=0x7fee2b377c90, get_outp=0x7fee2b377c98) at beam/jit/x86/instr_msg.cpp:134

After about an hour I checked and most the bif_return_trap/2, erts_internal,dsend_continue_trap/1 were gone and the memory went down to 20GB and the nodes with the debug smp emulator are still staying up.

sverker commented 8 months ago

I don't know if the circular offheap lists are part of the root cause or just some secondary symptom. However, I made a branch at https://github.com/sverker/otp/tree/sverker/check-circular-offheap/25.3.2.8 that checks for circular offheap lists at strategic places. It's based on OTP-25.3.2.8 and can be compiled (with check) both as opt and debug VM.

nickva commented 8 months ago

Thank you, I'll compile it and give it a try.

nickva commented 8 months ago

I tried it and the issue happened (memory blew up to 43GB) but the assert didn't trigger.

Wonder if there are more places to insert the check?

sverker commented 8 months ago

@nickva Did you run optimized or debug VM?

nickva commented 8 months ago

I ran the optimized version of the VM.

sverker commented 8 months ago

A pushed a commit to https://github.com/sverker/otp/tree/sverker/check-circular-offheap/25.3.2.8 that also checks for circular offheap lists when processes are exiting.

nickva commented 8 months ago

Thanks, Sverker. I had started on the same path here https://github.com/erlang/otp/compare/master...nickva:otp:sverker/check-circular-offheap/25.3.2.8

I noticed it would be nice to have line numbers, so I altered the assert macro a bit. And probably added check in way too many places... I added a few places you had added to it and re-ran the tests.

So far I got two assert crashes (with that commit above) in the same

2024-02-01_22:09:14.71118 beam/erl_process.c:13357:erts_proc_exit_handle_dist_monitor()
 Assertion failed: erts_check_circular_offheap(c_p)
...
2024-02-01_22:09:19.73615 beam/erl_process.c:13357:erts_proc_exit_handle_dist_monitor()
 Assertion failed: erts_check_circular_offheap(c_p)

That' s coming from erts_proc_exit_handle_dist_monitor function in the ERTS_DSIG_PREP_CONNECTED case:

https://github.com/nickva/otp/blob/sverker/check-circular-offheap/25.3.2.8/erts/emulator/beam/erl_process.c#L13357

sverker commented 8 months ago

So, it looks like the circular list happens between the last known successful check at https://github.com/nickva/otp/blob/882fefd136d1b4ab0f0dac24c79b5c6be2999252/erts/emulator/beam/erl_process.c#L14406 where erts_proc_exit_handle_dist_monitor is called for each node in a tree and https://github.com/nickva/otp/blob/sverker/check-circular-offheap/25.3.2.8/erts/emulator/beam/erl_process.c#L13357

I've stared at the code and tried to reproduce myself without success so far. Could you sprinkle some more checks in erts_proc_exit_handle_dist_monitor to get a more precise location?

nickva commented 8 months ago

I added a few more checks:

And now it fails at erl_process.c:13358: https://github.com/nickva/otp/blob/71237615b62ef876a82cbbe8757884efb8fd8ac2/erts/emulator/beam/erl_process.c#L13358

After watcher = copy_struct(watcher, watcher_sz, &hp, factory.off_heap);

2024-02-02_19:56:43.27406 beam/erl_process.c:13358:erts_proc_exit_handle_dist_monitor()
 Assertion failed: erts_check_circular_offheap(c_p)

nickva commented 8 months ago

Trying to dig a bit deeper I added a sleep call to attach the debugger right after when the condition happens: https://github.com/nickva/otp/commit/f9000c020978d99eb50cd99e55acecaac780fbff

        watcher = copy_struct(watcher, watcher_sz, &hp, factory.off_heap);
        if(!erts_check_circular_offheap(c_p)) {
           fflush(stdout);
           fflush(stderr);
           fprintf(stderr, "XXXXX %s:%s:%d \n", __FILE__, __func__, __LINE__);
           fflush(stderr);
           fflush(stdout);
           wait_debug();
        }

(gdb) bt
....
#3  0x0000563321e8d0ca in wait_debug () at beam/erl_process.c:12966
#4  erts_proc_exit_handle_dist_monitor (mon=0x563323842548, vctxt=0x5633234a1a88, reds=4000) at beam/erl_process.c:13369
#5  0x000056332208a681 in rbt_wrap_foreach_delete (reds=4000, vctxt=<synthetic pointer>, ml=0x563323842548) at beam/erl_monitor_link.c:403
#6  mon_lnk_rbt_foreach_unordered__ (destroying=1, op=<optimized out>, reds=4000, ystate=<optimized out>, yielding=1, arg=<synthetic pointer>, root=0x5633234a1a20) at beam/erl_rbtree.h:1420
#7  mon_lnk_rbt_foreach_destroy_yielding (reds=94777639183688, ystate=<optimized out>, arg=<synthetic pointer>, op=<optimized out>, root=0x5633234a1a20) at beam/erl_rbtree.h:1737
#8  ml_rbt_foreach_delete_yielding (limit=94777639183688, vyspp=0x5633234a1ad0, arg=0x5633234a1a48, func=0x563321e8cb60 <erts_proc_exit_handle_dist_monitor>, root=root@entry=0x5633234a1a60)
    at beam/erl_monitor_link.c:438
#9  erts_monitor_tree_foreach_delete_yielding (root=root@entry=0x5633234a1aa0, func=func@entry=0x563321e8cb60 <erts_proc_exit_handle_dist_monitor>, arg=arg@entry=0x5633234a1a88,
    vyspp=vyspp@entry=0x5633234a1ad0, limit=limit@entry=4000) at beam/erl_monitor_link.c:752
#10 0x0000563321eacb94 in erts_continue_exit_process (p=0x7fed09124978) at beam/erl_process.c:14426

Moved up to the exit dist handle frame:

(gdb) up
#4  erts_proc_exit_handle_dist_monitor (mon=0x563323842548, vctxt=0x5633234a1a88, reds=4000) at beam/erl_process.c:13369
13369              wait_debug();

Printing a few values:

(gdb) print *mon
$2 = {node = {signal = {next = 0x0, specific = {next = 0x563323831590, attachment = 0x563323831590}, tag = 0},
 tree = {parent = 0, right = 0x563323831590, left = 0x0}, list = {next = 0x0,
      prev = 0x563323831590}}, other = {item = 94777639183810, ptr = 0x5633238425c2}, offset = 40, 
key_offset = 40, flags = 17, type = 3}

(This one has the names replaced)

(gdb) etp watcher
<19720:xxxnodenamexxxx@xxxourhostnamexxx.net/1706803863.13060.184>.

(gdb) etp watched
<0.6040.181>

(gdb) print factory
$5 = {mode = FACTORY_HALLOC, p = 0x7fed09124978, hp_start = 0x7fed1931bbd0, original_htop = 0x7fed1931bbd0, 
hp = 0x7fed1931bbf0, hp_end = 0x7fed1931c138, message = 0x0, heap_frags = 0x0,
  heap_frags_saved = 0x0, heap_frags_saved_used = 0, off_heap = 0x7fed09124b90, 
off_heap_saved = {first = 0x7fed1931bbf0, overhead = 0}, alloc_type = 0}

(gdb) etp reason
{shutdown,should_close}.

(gdb) print mdp
$8 = (ErtsMonitorData *) 0x563323842520
(gdb) print *mdp
$9 = {origin = {node = {signal = {next = 0x7fed4823d630, specific = {next = 0x563323b8bdf8, attachment = 0x563323b8bdf8}, tag = 0}, tree = {parent = 140657094284848, right = 0x563323b8bdf8, left = 0x0},
      list = {next = 0x7fed4823d630, prev = 0x563323b8bdf8}}, other = {item = 1333454296414595, ptr = 0x4bcc505a97983}, offset = 0, key_offset = 80, flags = 18, type = 3}, u = {target = {node = {signal = {
          next = 0x0, specific = {next = 0x563323831590, attachment = 0x563323831590}, tag = 0}, tree = {parent = 0, right = 0x563323831590, left = 0x0}, list = {next = 0x0, prev = 0x563323831590}},
      other = {item = 94777639183810, ptr = 0x5633238425c2}, offset = 40, key_offset = 40, flags = 17, type = 3}, ref_heap = {0, 94777639114128, 0, 94777639183810}}, ref = 94777639183770, refc = {
    counter = 2}}

The process

(gdb) print *c_p
$10 = {common = {id = 1333454296414595, refc = {atmc = {counter = 1}, sint = 1}, tracer = 59, trace_flags = 0, timer = {counter = 0}, u = {alive = {started_interval = 49115, reg = 0x0,
        links = 0x5633247c9b28, lt_monitors = 0x7fed20365468, monitors = 0x563323831590}, release = {later = 49115, func = 0x0, data = 0x5633247c9b28, next = 0x7fed20365468}}}, htop = 0x7fed1931c138,
  stop = 0x7fed1931c150, frame_pointer = 0x0, fcalls = 4000, freason = 5, fvalue = 140656441778818, heap = 0x7fed19313f48, hend = 0x7fed1931c210, abandoned_heap = 0x0, heap_sz = 4185, min_heap_size = 233,
  min_vheap_size = 46422, max_heap_size = 3, arity = 0, arg_reg = 0x7fed09124a48, max_arg_reg = 6, def_arg_reg = {140658188098876, 542219, 140656003446713, 367, 479, 4000}, i = 0x7ff126994c88,
  catches = -1, rcount = 0, schedule_count = 0, reds = 2072, flags = 2048, group_leader = 36455682425427, ftrace = 59, next = 0x0, uniq = 0, sig_qs = {first = 0x7fed41c13748, last = 0x7fed41c13748,
    save = 0x7fed09124ac0, cont = 0x0, cont_last = 0x7fed09124ad8, nmsigs = {next = 0x0, last = 0x0}, recv_mrk_blk = 0x0, len = 1, flags = 2}, bif_timers = 0x0, dictionary = 0x0, seq_trace_clock = 0,
  seq_trace_lastcnt = 0, seq_trace_token = 59, u = {real_proc = 0x5633234a1a60, terminate = 0x5633234a1a60, initial = {module = 94777635379808, function = 542219, arity = 5}}, current = 0x0,
  parent = 1289714349451763, static_flags = 0, high_water = 0x7fed19314830, old_hend = 0x7fed19313f40, old_htop = 0x7fed1930bca8, old_heap = 0x7fed19306ba0, gen_gcs = 1, max_gen_gcs = 65535, off_heap = {
    first = 0x7fed1931bbd0, overhead = 0}, wrt_bins = 0x0, mbuf = 0x0, live_hf_end = 0xfffffffffffffff8, msg_frag = 0x7fed213f6230, mbuf_sz = 7, psd = {counter = 0}, bin_vheap_sz = 46422,
  bin_old_vheap_sz = 46422, bin_old_vheap = 0, sys_task_qs = 0x0, dirty_sys_tasks = 0x0, state = {counter = 44074}, xstate = {counter = 0}, sig_inq_contention_counter = 0, sig_inq = {first = 0x0,
    last = 0x7fed09124c08, len = 0, nmsigs = {next = 0x0, last = 0x0}}, sig_inq_buffers = {counter = 0}, trace_msg_q = 0x0, lock = {flags = {counter = 1}, queue = {0x0, 0x0, 0x0, 0x0, 0x0}},
  scheduler_data = 0x7ff12955ed40, run_queue = {counter = 140673756865856}}

(gdb) etp-process-info-x c_p
  Pid: <0.6040.181>
  State: running | active | exiting | free | prq-prio-normal | usr-prio-normal | act-prio-normal

  Flags: disable-gc
  Current function: unknown
  I: #Cp<0x7ff126994c88>
  Heap size: 4185
  Old-heap size: 6772
  Mbuf size: 7
  Msgq len: 1 (inner=1, outer=0)
  Parent: <0.4767.181>
  Pointer: (Process*)0x7fed09124978
  Msgq Flags: on-heap
  --- Inner signal queue (message queue) ---
    [{#Ref<0.552685909.192675855.75491>,done} @from= <0.4767.181> % <== SAVE]

    Message signals: 1
    Non-message signals: 0

  --- Middle signal queue ---
    []

    Message signals: 0
    Non-message signals: 0

  --- Outer queue ---
    []

    Message signals: 0
    Non-message signals: 0

Invalid type combination in equality test.

(gdb) etp-stacktrace c_p
%%% WARNING: The process is currently running, so c_p->stop will not be correct
%%%          Consider using -emu or -jit variant instead
% Stacktrace (24)
I: #Cp<0x7ff126994c88>.
6: #Cp<rexi_utils:process_mailbox/6+0xa0> @ "src/rexi_utils.erl":55.
12: #Cp<fabric_db_update_listener:receive_results/3+0xf4> @ "src/fabric_db_update_listener.erl":146.
16: #Cp<fabric_db_update_listener:go/5+0x208> @ "src/fabric_db_update_listener.erl":56.
23: #Cp<terminate process normally>.

ctx struct

(gdb) print ctx
$11 = {connect = 617671280, no_suspend = 0, no_trap = 0, ignore_busy = 0, ctl = 140673681517520, msg = 94777612353448, from = 1333454296414595, ctl_heap = {94777613614237, 463, 4390227507, 94777639131210,
    0, 140673681517168, 210453397456, 140655883978210}, return_term = 75, dep = 0x7fed4823ac00, node = 140673681517216, cid = 549755814151, connection_id = 15707689, deref_dep = 0,
  phase = ERTS_DSIG_SEND_PHASE_INIT, reds = 128000, data_size = 140673681517344, dhdr_ext_size = 140673681517344, dhdrp = 0x7ff124d0eba0 " ��$�\177", extp = 0x7ff124d0eb20 "\001", acmp = 0x7ff1000005e2,
  obuf = 0x563321fb0e01 <demonitor+897>, alloced_fragments = 140673681517348, fragments = 140673681517352, vlen = 140673681517216, dflags = 55966662589, c_p = 0x7fed09124978, u = {sc = {
      dflags = 140673757343384, level = 693497512, vlen = 140673681517352, iovec = 617671456, fragment_size = 140673681517348, last_result = 140673681517312, extra_size = 94777613004019, result = 1,
      obj = 3682346, wstack = {wstart = 0x7fff4b6c995a <clock_gettime+90>, wsp = 0x7ff12955ed40, wend = 0x7ff124d0eb40, wdefault = 0x0, alloc_type = 0}}, ec = {dflags = 140673757343384,
      hopefull_flags = 140673757343400, hopefull_flagsp = 0x7ff124d0eb28 "|�\026\"3V", level = 617671456, ep = 0x7ff124d0eb24 "", obj = 140673681517312, wstack = {
        wstart = 0x563321f4acf3 <erts_alloc_handle_delayed_dealloc+67>, wsp = 0x1, wend = 0x38302a, wdefault = 0x7fff4b6c995a <clock_gettime+90>, alloc_type = 693497152}, map_array = 0x7ff124d0eb40,
      next_map_element = 0x0, ycf_yield_state = 0x0, result_bin = 0x9bcb07, cptr = 0x1 <error: Cannot access memory at address 0x1>, vlen = 140674917687425, size = 1,
      payload_ixp = 0x56332216a87c <ethr_rwmutex_runlock+44> "H\211D$\bI\211�H\205�\017\204�", hopefull_ixp = 0x0, iov = 0x7ff12a2184c0, binv = 0x1a887, termv = 0x7ff124d0eb80, iovec = 0,
      fragment_size = 94777613724998, frag_ix = 140673757343040, fragment_eiovs = 0x0}}}

(gdb) print *c_p.off_heap->first
$17 = {thing_word = 240, size = 140658033120096, next = 0x7fed1931bbf0}
(gdb) print *c_p.off_heap->first->next
$18 = {thing_word = 312, size = 94777642703016, next = 0x7fed1931bbd0}
(gdb) print *c_p.off_heap->first->next->next
$19 = {thing_word = 240, size = 140658033120096, next = 0x7fed1931bbf0}
(gdb) print *c_p.off_heap->first->next->next->next
$20 = {thing_word = 312, size = 94777642703016, next = 0x7fed1931bbd0}

This one is circular to repeats up to some depth

(gdb) etp-offheapdump c_p->off_heap.first
% Offheap dump:
[<19720:xxxnodenamexxxe@node2host.net/1706803863.13060.184>,
 #Ref<19721:xxxnodenamexxx@node3host.net/1705328727.2966507853.193462279.243060>,
 <19720:xxxnodenamexxx@node2host.net/1706803863.13060.184>,
  ...

Sorry for all the mess, that's a lot of stuff. I didn't know what would be useful so opted to dump all the stuff I had.

sverker commented 8 months ago

@nickva First of all, thanks for your quick and competent responses.

It looks like someone has already built something on heap and linked it in offheap list but not increased the heap top (c_p->htop). "Our" copy_struct() then builds a term (external pid) over the existing term(s) in offheap list and that it was causing the circular list.

I've pushed yet another (one liner) commit to https://github.com/sverker/otp/tree/sverker/check-circular-offheap/25.3.2.8 that also checks that no terms in offheap list is above the heap top.

You should probably combine that commit with your existing branch with all the additional checks.

nickva commented 8 months ago

Thank you, @sverker

I gave it a try in a new commit https://github.com/nickva/otp/commit/c8982760cee69a466c6c85f4ef225dc447f8b95f

But I noticed an assert trigger in beam_jit_call_nif as soon as it started right away. So added an extra log statement there.

$./bin/cerl
XXXXX beam/jit/beam_jit_common.cpp:beam_jit_call_nif:558 htop:0x7f81b8935080 hend:0x7f81b89353d0 first:0x7f81b8935058
beam/jit/beam_jit_common.cpp:562:beam_jit_call_nif() Assertion failed: (erts_check_circular_offheap(c_p)==0)
Aborted

$ ./bin/cerl
XXXXX beam/jit/beam_jit_common.cpp:beam_jit_call_nif:558 htop:0x7f52327e1080 hend:0x7f52327e13d0 first:0x7f52327e1058
beam/jit/beam_jit_common.cpp:562:beam_jit_call_nif() Assertion failed: (erts_check_circular_offheap(c_p)==0)
Aborted

It seems off_heap.first pointer there is below the htop. But I wonder if that's expected in that context?

sverker commented 8 months ago

You switched the condition around. fast in NOT allowed to be in between htop and hend.

Should be if (ErtsInBetween(fast, p->htop, p->hend)) return 1;

nickva commented 8 months ago

Oops, sorry. That makes sense. I'll fix the check and retest

sverker commented 8 months ago

From heap to htop are terms alreay built on the heap. From htop to stop is free space available for heap and/or stack to grow toward each other. From stop to hend is current stack usage.

When htop and stop meet, a garbage collection is (eventually) triggered.

nickva commented 8 months ago

Thank you for explaining @sverker

I updated the commit https://github.com/nickva/otp/commit/e0cd2e7953dcb06a8c0f6707c98bd3aae8487b63

Now the assert fails in erts_continue_exit_process

#2  0x000056273d27c164 in erl_assert_error (expr=expr@entry=0x56273d335030 "(erts_check_circular_offheap(p)==0)",
    func=func@entry=0x56273d335f70 <__func__.5> "erts_continue_exit_process",
    file=file@entry=0x56273d3353f5 "beam/erl_process.c", line=line@entry=14051) at sys/unix/sys.c:955
#3  0x000056273d03f489 in erts_continue_exit_process (p=0x7f2cdba91de8) at beam/erl_process.c:14051
#4  0x00007f30bf85ccac in ?? ()
#5  0x0000000000000000 in ?? ()

p p
$6 = {common = {id = 419841643314099, refc = {atmc = {counter = 1}, sint = 1}, tracer = 59, trace_flags = 0, timer = {
      counter = 0}, u = {alive = {started_interval = 7348, reg = 0x0, links = 0x56273ef85158,
        lt_monitors = 0x7f2cd5b0c080, monitors = 0x56273e0531b0}, release = {later = 7348, func = 0x0,
        data = 0x56273ef85158, next = 0x7f2cd5b0c080}}}, htop = 0x7f2c8cf98990, stop = 0x7f2c8cf9f330,
  frame_pointer = 0x0, fcalls = 4000, freason = 5, fvalue = 139828775977202, heap = 0x7f2c8cf97128,
  hend = 0x7f2c8cf9f3f0, abandoned_heap = 0x0, heap_sz = 4185, min_heap_size = 233, min_vheap_size = 46422,
  max_heap_size = 3, arity = 0, arg_reg = 0x7f2cdba91eb8, max_arg_reg = 6, def_arg_reg = {139830879985852, 542219,
    139829596488353, 0, 0, 4000}, i = 0x7f30bf85cc88, catches = -1, rcount = 0, schedule_count = 0, reds = 2894,
  flags = 2048, group_leader = 99265284191379, ftrace = 59, next = 0x7f2cdba814e0, uniq = 0, sig_qs = {
    first = 0x7f30c0b903c0, last = 0x7f30c0b903c0, save = 0x7f2cdba91f30, cont = 0x0, cont_last = 0x7f2cdba91f48,
    nmsigs = {next = 0x0, last = 0x0}, recv_mrk_blk = 0x0, len = 1, flags = 2}, bif_timers = 0x0, dictionary = 0x0,
  seq_trace_clock = 0, seq_trace_lastcnt = 0, seq_trace_token = 59, u = {real_proc = 0x56273da741c0,
    terminate = 0x56273da741c0, initial = {module = 94726538084800, function = 542219, arity = 5}}, current = 0x0,
  parent = 397542173102883, static_flags = 0, high_water = 0x7f2c8cf972a0, old_hend = 0x7f2c8d3a2408,
  old_htop = 0x7f2c8d39a760, old_heap = 0x7f2c8d395068, gen_gcs = 2, max_gen_gcs = 65535, off_heap = {
    first = 0x7f2c8cf989f8, overhead = 0}, wrt_bins = 0x0, mbuf = 0x0, live_hf_end = 0xfffffffffffffff8,
  msg_frag = 0x7f2c6c8580a0, mbuf_sz = 7, psd = {counter = 0}, bin_vheap_sz = 46422, bin_old_vheap_sz = 46422,
  bin_old_vheap = 0, sys_task_qs = 0x0, dirty_sys_tasks = 0x0, state = {counter = 44074}, xstate = {counter = 0},
  sig_inq_contention_counter = 0, sig_inq = {first = 0x0, last = 0x7f2cdba92078, len = 0, nmsigs = {next = 0x0,
      last = 0x0}}, sig_inq_buffers = {counter = 0}, trace_msg_q = 0x0, lock = {flags = {counter = 1}, queue = {0x0,
      0x0, 0x0, 0x0, 0x0}}, scheduler_data = 0x7f30c2482840, run_queue = {counter = 139847393817856}}

(gdb) p p->htop
$1 = (Eterm *) 0x7f2c8cf98990
(gdb) p p->hend
$2 = (Eterm *) 0x7f2c8cf9f3f0
(gdb) p p->stop
$3 = (Eterm *) 0x7f2c8cf9f330
(gdb) p p->off_heap.first

(gdb) p *(struct continue_exit_state *)p->u.terminate
$19 = {phase = ERTS_CONTINUE_EXIT_DIST_MONITORS, links = 0x0, monitors = 0x0, lt_monitors = 0x0,
  reason = 139828775977202, pectxt = {c_p = 0x7f2cdba91de8, reason = 139828775977202, dist_links = 0x0,
    dist_monitors = 0x0, pend_spawn_monitors = 0x0, wait_pend_spawn_monitor = 0x0, dist_state = 59, yield = 0},
  dep = 0x0, yield_state = 0x0, block_rla_ref = 4275109905}

(gdb) etp 139828775977202
{shutdown,should_close}.

(gdb) etp-offheapdump p->off_heap.first
warning: Expression is not an assignment (and might have no effect)
% Offheap dump:
[#Ref<19782:xxxredactedxxx@node3xxxredacted/1707158683.4025426303.3793485845.208590>,
 <19782:xxredactexxx@node3xxxredacted/1707158683.13038.0>,
 #Ref<18541:xxxredactedxxx@node1xxxredacted/1707156029.3537577912.3793485825.133817>,
 <18541:xxxredactedxxxe@node1xxxredactedxxxx/1707156029.12742.0>,
 #Fun<0x7f30c44dda30,0x7f2c8cf988d0,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98868,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98800,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98798,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98730,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf986c8,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98660,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf985f8,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98590,0x3>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98528,0xeffff0003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf984c0,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98458,0xc00000003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf983f0,0x7f2c8cf90003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98388,0x3>,
 #Fun<0x7f30c44dda30,0x7f2c8cf98320,0x9ffff0003>,
 #Fun<0x7f30c44dda30,0x7f2c8cf982b8,0x7f2c8cf90003>].

nickva commented 8 months ago

I managed to get rr-project debugger working and captured a few traces with assertions but haven't yet figure out how navigate properly.

$ rr replay
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
...
(rr) c
...
(rr) beam/erl_process.c:14051:erts_continue_exit_process() Assertion failed: (erts_check_circular_offheap(p)==0)

(rr) thread apply all bt

Thread 5 (Thread 300027.300039 (2_scheduler)):
#0  size_object_x (obj=<optimized out>, obj@entry=139826393929410, litopt=litopt@entry=0x7f2c745cfc00) at beam/copy.c:133
#1  0x000055f910714fbc in erts_send_message (sender=sender@entry=0x7f2bf79c66b8, receiver=receiver@entry=0x7f2c65ecb6e8, receiver_locks=receiver_locks@entry=0x7f2c745cfc6c, message=message@entry=139826393929410) at beam/erl_message.c:844
#2  0x000055f9106f100b in do_send (p=0x7f2bf79c66b8, to=<optimized out>, msg=139826393929410, return_term=<optimized out>, refp=<optimized out>, dist_ctx=<optimized out>, connect=1, suspend=1) at beam/bif.c:2407
#3  0x000055f9106f4ca1 in send_2 () at beam/bif.c:2510
#4  0x00007f2c75d3b413 in ?? ()

Thread 4 (Thread 300027.300041 (4_scheduler)):
#0  ethr_rwmutex_rwlock (rwmtx=0x7f2c65c613a0) at common/ethr_mutex.c:3032
#1  0x000055f91076694a in erts_rwmtx_rwlock (rwmtx=0x7f2c65c613a0) at beam/erl_threads.h:2070
#2  db_lock (kind=LCK_WRITE_REC, tb=0x7f2c65c61370) at beam/erl_db.c:689
#3  db_get_table_aux (p=0x7f2c65edf210, id=139826424005458, kind=LCK_WRITE_REC, name_already_locked=<optimized out>, freason_p=0x7f2c67cfacc8, what=4) at beam/erl_db.c:834
#4  0x000055f910769c7a in db_get_table (freason_p=0x7f2c67cfacc8, kind=LCK_WRITE_REC, what=4, id=<optimized out>, p=0x7f2c65edf210) at beam/erl_db.c:875
#5  ets_insert_2 (A__p=0x7f2c65edf210, BIF__ARGS=<optimized out>, A__I=<optimized out>) at beam/erl_db.c:2173
#6  0x00007f2c75d3b413 in ?? ()
#7  0x0000000000000000 in ?? ()

Thread 3 (Thread 300027.300040 (3_scheduler)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f2cb911c537 in __GI_abort () at abort.c:79
#2  0x000055f910829264 in erl_assert_error (expr=expr@entry=0x55f9108e2030 "(erts_check_circular_offheap(p)==0)", func=func@entry=0x55f9108e2f70 <__func__.5> "erts_continue_exit_process", file=file@entry=0x55f9108e23f5 "beam/erl_process.c", line=line@entry=14051) at sys/unix/sys.c:955
#3  0x000055f9105ec4d9 in erts_continue_exit_process (p=0x7f2bf7921a20) at beam/erl_process.c:14051
#4  0x00007f2c75d3ccac in ?? ()

Thread 2 (Thread 300027.300038 (1_scheduler)):
#0  0x000055f9107afbba in sweep (src_size=0, src=0x0, ohsz=139828640236216, oh=0x0, type=ErtsSweepNewHeap, n_htop=<optimized out>, n_hp=0x7f2be042ed48) at beam/erl_gc.c:2266
#1  sweep_new_heap (n_hp=<optimized out>, n_htop=<optimized out>, old_heap=old_heap@entry=0x7f2be042d568 "��B�+\177", old_heap_size=old_heap_size@entry=4544) at beam/erl_gc.c:2308
#2  0x000055f9107b2475 in do_minor (p=p@entry=0x7f2c1a4abee8, live_hf_end=live_hf_end@entry=0x0, mature=mature@entry=0x7f2be0a39220 "2\222��+\177", mature_size=mature_size@entry=1832, new_sz=1598, objv=objv@entry=0x7f2c749d2cb8, nobj=1) at beam/erl_gc.c:1753
#3  0x000055f9107b4425 in minor_collection (recl=<synthetic pointer>, ygen_usage=<optimized out>, nobj=1, objv=0x7f2c749d2cb8, need=4, live_hf_end=<optimized out>, p=0x7f2c1a4abee8) at beam/erl_gc.c:1443
#4  garbage_collect (p=p@entry=0x7f2c1a4abee8, live_hf_end=<optimized out>, live_hf_end@entry=0x0, need=need@entry=0, objv=objv@entry=0x7f2c749d2cb8, nobj=nobj@entry=1, fcalls=<optimized out>, max_young_gen_usage=0) at beam/erl_gc.c:754
#5  0x000055f9107b5221 in erts_gc_after_bif_call_lhf (p=0x7f2c1a4abee8, live_hf_end=0x0, result=139826429356601, regs=0x7f2c749d2d80, arity=2) at beam/erl_gc.c:460
#6  0x00007f2c75d3b4d6 in ?? ()

(rr) p *p
$1 = {common = {id = 3953044963989123, refc = {atmc = {counter = 1}, sint = 1}, tracer = 59, trace_flags = 0, timer = {counter = 0}, u = {alive = {
        started_interval = 10646, reg = 0x0, links = 0x7f2bf77050a8, lt_monitors = 0x7f2bf79ee0b0, monitors = 0x7f2c1a759fd0}, release = {
        later = 10646, func = 0x0, data = 0x7f2bf77050a8, next = 0x7f2bf79ee0b0}}}, htop = 0x7f2be224db80, stop = 0x7f2be22537d8,
  frame_pointer = 0x55000000000, fcalls = 4000, freason = 5, fvalue = 139826354201178, heap = 0x7f2be224b5d0, hend = 0x7f2be2253898,
  abandoned_heap = 0x0, heap_sz = 4185, min_heap_size = 233, min_vheap_size = 46422, max_heap_size = 3, arity = 0, arg_reg = 0x7f2bf7921af0,
  max_arg_reg = 6, def_arg_reg = {139828282519100, 542155, 139826453474521, 5242880, 139826814718120, 4000}, i = 0x7f2c75d3cc88, catches = -1,
  rcount = 0, schedule_count = 0, reds = 3085, flags = 2048, group_leader = 13159779801075, ftrace = 59, next = 0x0, uniq = 0, sig_qs = {
    first = 0x7f2c1a293ae0, last = 0x7f2c1a293ae0, save = 0x7f2bf7921b68, cont = 0x0, cont_last = 0x7f2bf7921b80, nmsigs = {next = 0x0,
      last = 0x0}, recv_mrk_blk = 0x0, len = 1, flags = 2}, bif_timers = 0x0, dictionary = 0x0, seq_trace_clock = 0, seq_trace_lastcnt = 0,
  seq_trace_token = 59, u = {real_proc = 0x55f91272c1c0, terminate = 0x55f91272c1c0, initial = {module = 94528244728256, function = 542155,
      arity = 5}}, current = 0x0, parent = 3860926505381619, static_flags = 0, high_water = 0x7f2be224be38, old_hend = 0x7f2be21f8de0,
  old_htop = 0x7f2be21f1168, old_heap = 0x7f2be21eba40, gen_gcs = 2, max_gen_gcs = 65535, off_heap = {first = 0x7f2be224dba0, overhead = 0},
  wrt_bins = 0x0, mbuf = 0x0, live_hf_end = 0xfffffffffffffff8, msg_frag = 0x7f2bdc2c2208, mbuf_sz = 7, psd = {counter = 0}, bin_vheap_sz = 46422,
  bin_old_vheap_sz = 46422, bin_old_vheap = 0, sys_task_qs = 0x0, dirty_sys_tasks = 0x0, state = {counter = 44074}, xstate = {counter = 0},
  sig_inq_contention_counter = 0, sig_inq = {first = 0x0, last = 0x7f2bf7921cb0, len = 0, nmsigs = {next = 0x0, last = 0x0}}, sig_inq_buffers = {
    counter = 0}, trace_msg_q = 0x0, lock = {flags = {counter = 1}, queue = {0x0, 0x0, 0x0, 0x0, 0x0}}, scheduler_data = 0x7f2c77c53340,
  run_queue = {counter = 139828964613824}}

(rr) p *p.off_heap->first
$5 = {thing_word = 312, size = 139828640281240, next = 0x7f2be224db80}
(rr) p *p.off_heap->first->next
$6 = {thing_word = 240, size = 139828640281240, next = 0x7f2be224dac8}
(rr) p *p.off_heap->first->next->next
$7 = {thing_word = 212, size = 139828633827800, next = 0x7f2be224da30}
(rr) p *p.off_heap->first->next->next->next
$8 = {thing_word = 212, size = 139828633827800, next = 0x7f2be224d9c8}
(rr) p *p.off_heap->first->next->next->next->next
$9 = {thing_word = 212, size = 139828633827800, next = 0x7f2be224d960}

rr) p p->off_heap.first
$11 = (struct erl_off_heap_header *) 0x7f2be224dba0
(rr) p p->htop
$12 = (Eterm *) 0x7f2be224db80
(rr) p p->hend
$13 = (Eterm *) 0x7f2be2253898

(rr) watch *((struct erl_off_heap_header *) 0x7f2be224dba0)
Hardware watchpoint 1: *((struct erl_off_heap_header *) 0x7f2be224dba0)
(rr) reverse-continue

(rr) up
#3  0x000055f9105ec4d9 in erts_continue_exit_process (p=0x7f2bf7921a20) at beam/erl_process.c:14051
14051   beam/erl_process.c: No such file or directory.
(rr) reverse-continue
Continuing.
[Switching to Thread 300027.300041]

Thread 4 hit Hardware watchpoint 1: *((struct erl_off_heap_header *) 0x7f2be224dba0)

Old value = {thing_word = 312, size = 139828640281240, next = 0x7f2be224db80}
New value = {thing_word = 312, size = 139828640281240, next = 0x0}
0x000055f9106dc6ab in copy_struct_x (obj=<optimized out>, sz=sz@entry=5, hpp=hpp@entry=0x7f2c67cfa8b8, off_heap=<optimized out>,
    bsz=bsz@entry=0x0, litopt=litopt@entry=0x0) at beam/copy.c:912
912 beam/copy.c: No such file or directory.
(rr) bt
#0  0x000055f9106dc6ab in copy_struct_x (obj=<optimized out>, sz=sz@entry=5, hpp=hpp@entry=0x7f2c67cfa8b8, off_heap=<optimized out>,
    bsz=bsz@entry=0x0, litopt=litopt@entry=0x0) at beam/copy.c:912
#1  0x000055f9105cbeb9 in erts_proc_exit_handle_dist_monitor (mon=0x7f2c1a759fd0, vctxt=0x7f2c67cfac08, reds=3843) at beam/erl_process.c:13375
#2  0x000055f9107c9ad1 in rbt_wrap_foreach_delete (reds=3843, vctxt=<synthetic pointer>, ml=0x7f2c1a759fd0) at beam/erl_monitor_link.c:403
#3  mon_lnk_rbt_foreach_unordered__ (destroying=1, op=<optimized out>, reds=3843, ystate=<optimized out>, yielding=1, arg=<synthetic pointer>,
    root=0x7f2c67cfab60) at beam/erl_rbtree.h:1420
#4  mon_lnk_rbt_foreach_destroy_yielding (reds=139827399204816, ystate=<optimized out>, arg=<synthetic pointer>, op=<optimized out>,
    root=0x7f2c67cfab60) at beam/erl_rbtree.h:1737
#5  ml_rbt_foreach_delete_yielding (limit=139827399204816, vyspp=0x7f2c67cfac50, arg=0xf2b,
    func=0x55f9105cbb60 <erts_proc_exit_handle_dist_monitor>, root=root@entry=0xf43) at beam/erl_monitor_link.c:438
#6  erts_monitor_tree_foreach_delete_yielding (root=root@entry=0x7f2c67cfac20,
    func=func@entry=0x55f9105cbb60 <erts_proc_exit_handle_dist_monitor>, arg=arg@entry=0x7f2c67cfac08, vyspp=vyspp@entry=0x7f2c67cfac50,
    limit=limit@entry=3843) at beam/erl_monitor_link.c:752
#7  0x000055f9105ebe4d in erts_continue_exit_process (p=p@entry=0x7f2bf7921a20) at beam/erl_process.c:14426
#8  0x000055f9105ecbdf in erts_do_exit_process (p=p@entry=0x7f2bf7921a20, reason=reason@entry=139826354201178) at beam/erl_process.c:14001
#9  0x000055f91067052b in terminate_proc (Value=139826354201178, c_p=0x7f2bf7921a20) at beam/beam_common.c:720
#10 handle_error (c_p=0x7f2bf7921a20, pc=<optimized out>, reg=<optimized out>, bif_mfa=<optimized out>) at beam/beam_common.c:569
#11 0x00007f2c75d3c4df in ?? ()

sverker commented 8 months ago

Run the replay again until assert fail with p->off_heap.first larger than p->htop. Set watch points on both of them

(rr) watch -l p->htop
(rr) watch -l p->off_heap.first

Don't forget -l which means to watch the memory location and not to re-evaluate the expression. And then reverse-continue. From your run it looks like p->off_heap.first was set correctly in an earlier call to the same code in erts_proc_exit_handle_dist_monitor but then my guess is someone must have stepped p->htop backwards.

sverker commented 8 months ago

You navigate rr replay as a normal interactive gdb session but you can also do reverse-continue, rc reverse-next, rn reverse-step, rs reverse-stepi, rsi reverse-finish

nickva commented 8 months ago

Thank you @sverker. The -l option did the trick.

I got a bit further, I see the watchpoint trigger but the backtrace looks unreadable:

(rr) display p->off_heap.first
1: p->off_heap.first = (struct erl_off_heap_header *) 0x7f2be224dba0
(rr) display p->htop
2: p->htop = (Eterm *) 0x7f2be224db80
(rr) reverse-continue
Continuing.
[Switching to Thread 300027.300041]

Thread 4 hit Hardware watchpoint 1: -location p->htop

Old value = (Eterm *) 0x7f2be224db80
New value = (Eterm *) 0x7f2be224dbc8
0x00007f2c75d3c6cf in ?? ()

(rr) bt
#0  0x00007f2c75d3c6cf in ?? ()
#1  0x000000000000014f in ?? ()
#2  0x000000000001a70b in ?? ()
...
#2980 0x0000000000000000 in ?? ()

Wonder if it's because of the jit or does it looks like something scribbled over the memory?

Doing a bunch of reverse-stepi in the same thread I ended up at the bottom of handle_error: https://github.com/nickva/otp/blob/e0cd2e7953dcb06a8c0f6707c98bd3aae8487b63/erts/emulator/beam/beam_common.c#L572

(rr) reverse-stepi
0x00007f2c75d3c6c1 in ?? ()

...
(rr) reverse-stepi
0x000055f91067053b in handle_error (c_p=<optimized out>, pc=<optimized out>, reg=<optimized out>, bif_mfa=<optimized out>) at beam/beam_common.c:572
572 }

Threads info at that time:

(rr) info threads
  Id   Target Id                              Frame
  1    Thread 300027.300027 (mmap_hardlink_3) 0x0000000070000002 in ?? ()
  2    Thread 300027.300038 (1_scheduler)     0x000055f910673113 in erts_gc_update_map_exact (p=0x7f2c1a1a7c68, reg=0x7f2c749d2d80, live=4, n=1, new_p=0x7f2c76170cc0) at beam/erl_vm.h:323
  3    Thread 300027.300040 (3_scheduler)     move_boxed (orig=0x7f2bdbc85c18, hpp=0x7f2c741cc950, hdr=128, ptr=0x7f2bdbd62020) at beam/erl_gc.h:91
* 4    Thread 300027.300041 (4_scheduler)     0x000055f91067053b in handle_error (c_p=<optimized out>, pc=<optimized out>, reg=<optimized out>, bif_mfa=<optimized out>) at beam/beam_common.c:572
  5    Thread 300027.300039 (2_scheduler)     0x000055f9106dc29d in copy_struct_x (obj=139828986502586, obj@entry=139826457443850, sz=sz@entry=120, hpp=hpp@entry=0x7f2c745cfbd8, off_heap=<optimized out>, bsz=bsz@entry=0x0, litopt=litopt@entry=0x7f2c745cfbe0) at beam/copy.c:747
  6    Thread 300027.300033 (sys_sig_dispatc) 0x000000007000000e in ?? ()
  7    Thread 300027.300034 (sys_msg_dispatc) 0x0000000070000002 in ?? ()
  8    Thread 300027.300035 (async_1)         0x0000000070000002 in ?? ()
  9    Thread 300027.300042 (1_dirty_cpu_sch) 0x00007f2cb968b457 in sched_yield () at ./src/preload/overrides.c:166
  10   Thread 300027.300043 (1_dirty_io_sche) 0x00007f2cb968b457 in sched_yield () at ./src/preload/overrides.c:166
  11   Thread 300027.300044 (2_dirty_io_sche) 0x000000007000000e in ?? ()
  12   Thread 300027.300045 (3_dirty_io_sche) 0x00007f2cb968b457 in sched_yield () at ./src/preload/overrides.c:166
  13   Thread 300027.300046 (4_dirty_io_sche) 0x00007f2cb968b457 in sched_yield () at ./src/preload/overrides.c:166
  14   Thread 300027.300047 (1_aux)           0x0000000070000002 in ?? ()
  15   Thread 300027.300048 (0_poller)        0x0000000070000002 in ?? ()

After a few more steps backwards

(rr) reverse-step
0x000055f910670534  572 }
(rr) reverse-step
0x000055f910670531  572 }
(rr) reverse-step
0x000055f9105ebd56 in erts_continue_exit_process (p=p@entry=0x7f2bf7921a20) at beam/erl_process.c:14599
14599   }
1: p->off_heap.first = (struct erl_off_heap_header *) 0x7f2be224dba0
2: p->htop = (Eterm *) 0x7f2be224dbc8

nickva commented 8 months ago

I tried running with --disable-jit but I got an assert failure in a different place. In this case first == htop. I wonder if that's a false positive case?

Just in case it's not here are some more details:

In this case it seems to trigger when establishing a TLS connection. It's reproducible with just a plain erl prompt, doesn't need to run in our production / test environment

~/sverker-otp$ ./bin/cerl -rr
rr: Saving execution to trace directory `..../.local/share/rr/beam.smp-0'.
Erlang/OTP 25 [erts-13.2.2.5] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:1]

Eshell V13.2.2.5  (abort with ^G)
1> application:ensure_all_started(ssl), inets:start(), httpc:request("https://www.erlang.org").
=WARNING REPORT==== 6-Feb-2024::23:49:33.082876 ===
Description: "Server authenticity is not verified since certificate path validation is not enabled"
     Reason: "The option {verify, verify_peer} and one of the options 'cacertfile' or 'cacerts' are required to enable this."

x86_64-pc-linux-gnu/opt/emu/beam_hot.h:3952:process_main() Assertion failed: (erts_check_circular_offheap(c_p)==0)
                                                                                                                  Aborted

$ rr replay
...
(rr) c
Continuing.

Thread 2 received signal SIGABRT, Aborted.
[Switching to Thread 692184.692189]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) up
...
(rr) up
#3  0x0000555e837746ff in process_main (esdp=0x2, esdp@entry=0x7fa1c73757c0) at x86_64-pc-linux-gnu/opt/emu/beam_hot.h:3952
3952        ERTS_CHK_MBUF_SZ(c_p);
(rr) watch -l c_p->htop
Hardware watchpoint 1: -location c_p->htop
(rr) watch -l c_p->off_heap.first
Hardware watchpoint 2: -location c_p->off_heap.first
(rr) reverse-continue
Continuing.

Thread 2 received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(rr) reverse-continue
Continuing.

Thread 2 hit Hardware watchpoint 2: -location c_p->off_heap.first

Old value = (struct erl_off_heap_header *) 0x7fa1a661ed60
New value = (struct erl_off_heap_header *) 0x7fa1a661e920
process_main (esdp=0x47, esdp@entry=0x7fa1c73757c0) at x86_64-pc-linux-gnu/opt/emu/beam_hot.h:2204
2204          MSO(c_p).first = (struct erl_off_heap_header*) pb;
(rr) p c_p->htop
$1 = (Eterm *) 0x7fa1a661ed60
(rr) p c_p->off_heap.first
$2 = (struct erl_off_heap_header *) 0x7fa1a661e920
(rr) p c_p->hend
$3 = (Eterm *) 0x7fa1a6622208
(rr) next

Thread 2 hit Hardware watchpoint 2: -location c_p->off_heap.first

Old value = (struct erl_off_heap_header *) 0x7fa1a661e920
New value = (struct erl_off_heap_header *) 0x7fa1a661ed60
process_main (esdp=0x47, esdp@entry=0x7fa1c73757c0) at x86_64-pc-linux-gnu/opt/emu/beam_hot.h:2205
2205          pb->val = bptr;

(rr) p c_p->htop
$4 = (Eterm *) 0x7fa1a661ed60

(rr) p c_p->off_heap.first
$5 = (struct erl_off_heap_header *) 0x7fa1a661ed60

(rr) etp-stacktrace-emu c_p
% Stacktrace (68)
I: #Cp<ssl_handshake:encode_extensions/2+0x1a30> @ "ssl_handshake.erl":717.
3: #Cp<tls_handshake:enc_handshake/2+0x230> @ "tls_handshake.erl":419.
12: #Cp<tls_handshake:encode_handshake/2+0x68> @ "tls_handshake.erl":280.
13: #Cp<tls_gen_connection:encode_handshake/4+0x78> @ "tls_gen_connection.erl":600.
17: #Cp<tls_gen_connection:queue_handshake/2+0x1a0> @ "tls_gen_connection.erl":210.
23: #Cp<ssl_gen_statem:initial_hello/3+0xc40> @ "ssl_gen_statem.erl":500.
41: #Cp<gen_statem:loop_state_callback/11+0x1b8> @ "gen_statem.erl":1426.
55: #Cp<tls_connection:init/1+0x4b8> @ "tls_connection.erl":160.
63: #Cp<proc_lib:init_p_do_apply/3+0x78> @ "proc_lib.erl":240.
67: #Cp<terminate process normally>.
(rr) etp-process-info-x c_p
  Pid: <0.127.0>
  State: running | active | prq-prio-normal | usr-prio-normal | act-prio-normal

  Flags: trap-exit
  Current function: crypto:rsa_opts_algorithms/0
  I: #Cp<code_server:call/1+0x130> @ "code_server.erl":139
  Heap size: 2586
  Old-heap size: 1598
  Mbuf size: 0
  Msgq len: 0 (inner=0, outer=0)
  Parent: <0.125.0>
  Pointer: (Process*)0x7fa1c80f4eb0
  Msgq Flags: on-heap
  --- Inner signal queue (message queue) ---
    [
     % <== SAVE]

    Message signals: 0
    Non-message signals: 0

  --- Middle signal queue ---
    []

    Message signals: 0
    Non-message signals: 0

  --- Outer queue ---
    []

    Message signals: 0
--Type <RET> for more, q to quit, c to continue without paging--c
    Non-message signals: 0

  --- Link
    Type: Port
    Other: <0.125.0>
    Flags: ORIGIN IN_TABLE
    Pointer: (ErtsLink*)0x7fa1c80f51f8
  --- Link
    Type: Dist Proc
    Other: #Port<0.6>
    Flags: ORIGIN IN_TABLE
    Pointer: (ErtsLink*)0x7fa1c80f5278
  --- Monitor
    Type: Proc
    Other: <0.120.0>
    Flags: ORIGIN IN_TABLE
    Pointer: (ErtsMonitorData*)0x7fa1c80eb8a8
  --- Monitor
    Type: Proc
    Other: <0.120.0>
    Flags: TARGET IN_TABLE
    Pointer: (ErtsMonitorData*)0x7fa1c80ebc38

The surrounding code looks like: in x86_64-pc-linux-gnu/opt/emu/beam_hot.h:2205 in OpCase(i_bs_create_bin_jIWdW):

      /*
      * Now allocate the ProcBin on the heap.
      */
      pb = (ProcBin *) HTOP;
      HTOP += PROC_BIN_SIZE;
      pb->thing_word = HEADER_PROC_BIN;
      pb->size = num_bytes;
      pb->next = MSO(c_p).first;
->  MSO(c_p).first = (struct erl_off_heap_header*) pb;
      pb->val = bptr;
      pb->bytes = (byte*) bptr->orig_bytes;
      pb->flags = 0;
      OH_OVERHEAD(&(MSO(c_p)), pb->size / sizeof(Eterm));
      new_binary = make_binary(pb);

sverker commented 8 months ago

Yes that's a false positive in beam_hot.h. It uses HTOP, which is a cached value of c_p->htop, when it executes beam code.

The macros SWAPIN and SWAPOUT switch modes between using c_p->htop and using HTOP.

I pushed a commit to my branch where I removed calls to ERTS_CHK_MBUF_SZ when we are "switched in" and HTOP is used instead of c_p->htop.

It looks like your first try hit in jit code (which complicates things a bit) that stepped p->htop backwards. But I think it can be worth to try without jit again and my commit and see if we can hit the same logic in C-code. The jitted instructions are usually the same as the interpreted instructions written in C, just faster.

sverker commented 8 months ago

We might have found a bug that's causing your problem. I pushed a fix commit to my branch.

nickva commented 8 months ago

That's great news. Thank you, @sverker. I'll give it a try, immediately.

nickva commented 8 months ago

So far running with https://github.com/nickva/otp/commits/sverker/check-circular-offheap/25.3.2.8/ (both the false positive assertion fixes and the swap out fix with the jit build), no assertions have triggered. That seems to be good news!

~I still see high memory usage like before and it persists even after the load generation stops. Out of the 3 nodes not all nodes behave this way, some keep their RSS memory usage to around 3GB as expected. After about half an hour after the load generation stop the memory got deallocated and went down to 12GB.~

~However, without the cycle it seems now I can remsh in and was trying to what might be memory. The "allocated" memory reported by recon_alloc seems to match the RSS memory reported by the OS. It could be fragmentation? Wonder if anything changed since OTP 24 to affect allocation like that. It could be a result of running with the cycle checker enabled. I'll try running with the fix on top of the pristine 25.3.2.8 branch.~

UPDATE: Running with your fix on top of a pristine OTP-25.3.2.8 for a few hours looks great as well. Removing all the cycle checks seems to have helped with memory usage mentioned earlier in the comment. I think what might have been happening is some NIFS were called often, and there is a cycle test check before and after NIF calls that lead to a singleton process in our system using the NIF to block and slow down.

sverker commented 8 months ago

@nickva Thanks, for your excellent trouble shooting. It's always a bit frustrating to not be able to reproduce but instead having to revert to "remote control" someone else. But this went really well, at least from my side.

I assume you got an initial taste of rr debugging, which I really can recommend for low level language debugging. The combination of watchpoints and reverse execution can be really powerful and in practice save hours of frustration.

I will put this fix in the pipe for next OTP 25 patch release.

nickva commented 8 months ago

@sverker thank you for your help. I really appreciate your time, effort, and patience debugging this issue.

Yeah, it was frustrating not being able to reproduce it at will. I had to resort to developing on a test db node, with emacs tramp mode and various custom building and deployment scripts.

Indeed, rr is quite an amazing tool. This was my first time trying it out and I am definitely a fan! I first heard about from Lukas when debugging another OTP memory issue. Thanks for the advice and pointers on how to use it properly!

nickva commented 7 months ago

@sverker I had noticed the PR open to fix it https://github.com/erlang/otp/pull/8104 has emit_enter_runtime<Update::eHeap | Update::eStack | Update::eReductions>() and then the maint branch has a slight different commit with emit_enter_runtime<Update::eHeapAlloc | Update::eReductions>();. I understand the version from maint is the final fix as Update::eHeapAlloc is the combined stack and heap flag, but wanted to double-check with you before deploying it in production.

jhogberg commented 7 months ago

Both are correct and mean the same thing. Update::eHeapAlloc was not present in OTP 25, is all. :-)

nickva commented 7 months ago

That explains it! Thank you for answering.

nickva commented 5 months ago

We have been running the updated version in production for months without issues.

Thank you, @sverker and the rest of the OTP team, for taking the time to help debug and fix the issue!