brndnmtthws / conky

Light-weight system monitor for X, Wayland (sort of), and other things, too
https://conky.cc
GNU General Public License v3.0
7.25k stars 620 forks source link

[Bug]: Random segfaults #1791

Open andreaemonti opened 7 months ago

andreaemonti commented 7 months ago

What happened?

After hours of running smoothly, conky crashes leaving only a segfault message on the system journal.

I'm sorry it's not very reproducible, because it really happens randomly hours after launch. That's why I also haven't experimented much.

Errors in the system journal

These are how most of the logs look like

> journalctl | grep segfault
...
mar 24 00:40:08 yoga kernel: conky[1891]: segfault at 30 ip 0000000000000030 sp 00007fffd1f7a5d8 error 14 in conky[625591428000+c000] likely on CPU 6 (core 3, socket 0)
mar 24 02:30:08 yoga kernel: conky[1902]: segfault at 30 ip 0000000000000030 sp 00007ffd8f8ab4f8 error 14 in conky[5d130410b000+c000] likely on CPU 2 (core 1, socket 0)
mar 25 11:54:16 yoga kernel: conky[1912]: segfault at 30 ip 0000000000000030 sp 00007ffcecc91e38 error 14 in conky[5846d3b63000+c000] likely on CPU 9 (core 4, socket 0)
mar 25 17:08:37 yoga kernel: conky[1991]: segfault at 30 ip 0000000000000030 sp 00007ffeef79ada8 error 14 in conky[64bcaa9b2000+c000] likely on CPU 1 (core 0, socket 0)
mar 26 12:27:45 yoga kernel: conky[1934]: segfault at 30 ip 0000000000000030 sp 00007ffc81c4c448 error 14 in conky[6409a0570000+c000] likely on CPU 15 (core 7, socket 0)
mar 26 20:18:57 yoga kernel: conky[1878]: segfault at 30 ip 0000000000000030 sp 00007ffc4c391768 error 14 in conky[59c247397000+c000] likely on CPU 6 (core 3, socket 0)
mar 28 11:38:48 yoga kernel: conky[1911]: segfault at 30 ip 0000000000000030 sp 00007ffd93b1dc98 error 14 in conky[5bde0de38000+c000] likely on CPU 10 (core 5, socket 0)

I tried launching conky with -D debug flag, but I didn't get more info on the crash. The log on the journal though looked a bit different

mar 29 16:42:53 yoga kernel: conky[2433]: segfault at 700c9f7d26a0 ip 0000700c9f7d26a0 sp 00007ffdf980fd48 error 15 in libc.so.6[700c9f7d1000+2000] likely on CPU 10 (core 5, socket 0)

Error on launch (probably unrelated)

The only error I get as output on the console when I launch conky is conky: invalid setting of type 'table' But I cannot understand why

Version

conky 1.19.7_pre compiled 2024-02-26 for Linux x86_64

Which OS/distro are you seeing the problem on?

Arch Linux

Conky config

conky.config = {
    out_to_x = false,
    out_to_wayland = true,
    alignment = 'top_right',
    cpu_avg_samples = 2,
    default_shade_color = '444444',
    draw_graph_borders = false,
    draw_shades = false,
    use_xft = true,
    gap_x = 50,
    gap_y = 50,
    minimum_width = 250,
    net_avg_samples = 2,
    no_buffers = true,

    out_to_stderr = false,

    own_window_argb_visual = true,
    own_window_transparent = true,
    own_window_argb_value = 150,

    update_interval = 1,

    font1 = 'Inter:bold:size=12',
    font2 = 'Inter:size=12',
    font3 = 'DejaVu Sans Mono:size=12'
};

conky.text = [[
${color white}
$alignc${font Inter:size=45}${time %H:%M}${font2 :size=18}${time :%S}
$alignc${font Inter:size=16} ${time %d %B}

#SYSTEM
${font1}CPU:${font2}  $cpu% $alignr $acpitemp°C
${cpugraph 40,250 0000ff ff0000 -t}

${font1}RAM:${font2}  $memperc% $alignr $mem
${memgraph 40,250 00ff00 00ff00}
swap: $swapperc%

${font1}SSD I/O:${font2}  $diskio/s $alignr
${diskiograph 20,250 ffffff ffffff}

#NETWORK
${font1}Network: $alignr${font3} ${addr wlan0}
${font2}$alignc ${alignr 65} speed $alignr total
↑   ${alignc -20}${upspeed wlan0}/s ${alignr}${totalup wlan0}
↓   ${alignc -20}${downspeed wlan0}/s ${alignr}${totaldown wlan0}

#STORAGE
${font1}root${font2} $alignr ${fs_used /}/${fs_size /}
${fs_bar /}
${font1}home${font2} $alignr ${fs_used /home}/${fs_size /home}
${fs_bar /home}
]];

Stack trace

No response

Relevant log output

No response

brndnmtthws commented 7 months ago

It would be immensely helpful if you're able to get a stack trace with debug symbols from a core dump. It should be as simple as setting ulimit appropriately (i.e., ulimit -c unlimited) and enabling debug symbols for your system.

andreaemonti commented 7 months ago

I didn't know how to get it, but this should be it (it's the output of thread apply all backtrace full in gdb launched on the coredump) gdb.txt

Thanks for the help! Let me know if there is any other info I should give.

brndnmtthws commented 7 months ago

It's a little difficult to tell which thread the segfault came from. When you load the core dump into gdb, it should print something like this:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000001234 in some_function () from /lib/somelib.so
(gdb)

Can you provide that output as well?

andreaemonti commented 7 months ago

Sure! It should be the last one in the log file, aka Thread 1.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Core was generated by `conky -c /home/andrea/.conky/old/home'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000030 in ?? ()
[Current thread is 1 (Thread 0x7121c6378a00 (LWP 1891))]
(gdb)
brndnmtthws commented 7 months ago

Okay thanks, that's what I thought. Looks like a race condition around the call to wl_display_dispatch_pending().

Caellian commented 6 months ago

Can you say which WM you're using?

The only error I get as output on the console when I launch conky is conky: invalid setting of type 'table' But I cannot understand why

I can't see why either. Try changing conky.text (remove first half; second half; then half of the broken one; ...), conky.config seems fine to me.

Caellian commented 6 months ago

I can't see why either. Try changing conky.text (remove first half; second half; then half of the broken one; ...), conky.config seems fine to me.

This is actually a (separate) bug: #1806, ignore it for now :)

andreaemonti commented 6 months ago

Can you say which WM you're using?

I don't know if relevant or not any more. Anyway I'm on EndeavourOS with KDE Plasma. So I have KWin as WM.

I also tried splitting the conky in more files, to see if maybe only one crashed, but had no crash during that testing period, so went back to the single file as above.

Today, after a week without crashes, I noticed that it crashed on resuming from suspension. I don't know if this helps to figure out the issue. The previous times I only noticed after a while that conky was not there any more, and didn't know when it happened precisely. But checking the journal on a couple of previous segfaults I saw it happened once while entering suspension, and once during poweroff (but I also had a kernel panic there, for other GPU drivers issues I believe). So maybe it is related.