Open kakra opened 3 years ago
Segfaults on 3 different CPU threads
[ +0.075551] fossilize_repla[16964]: segfault at 18 ip 0000558565ecb65e sp 00007ffd34212f60 error 4 in fossilize_replay[558565e63000+23d000] likely on CPU 5 (core 5, socket 0)
[ +0.000015] Code: 85 db 75 d8 49 8b 9f 60 02 00 00 48 85 db 74 2c 0f 1f 40 00 48 8b 73 10 48 85 f6 0f 84 83 02 00 00 49 8b 87 f8 0c 00 00 31 d2 <48> 8b 78 18 ff 15 70 40 21 00 48 8b 1b 48 85 db 75 d8 49 8b 9f 98
[ +0.109306] fossilize_repla[16965]: segfault at 18 ip 0000558565ecb65e sp 00007ffd34212f60 error 4 in fossilize_replay[558565e63000+23d000] likely on CPU 13 (core 5, socket 0)
[ +0.000017] Code: 85 db 75 d8 49 8b 9f 60 02 00 00 48 85 db 74 2c 0f 1f 40 00 48 8b 73 10 48 85 f6 0f 84 83 02 00 00 49 8b 87 f8 0c 00 00 31 d2 <48> 8b 78 18 ff 15 70 40 21 00 48 8b 1b 48 85 db 75 d8 49 8b 9f 98
[ +2.998189] fossilize_repla[17043]: segfault at 18 ip 000055b93cc9265e sp 00007ffdf33deeb0 error 4 in fossilize_replay[55b93cc2a000+23d000] likely on CPU 1 (core 1, socket 0)
[ +0.000019] Code: 85 db 75 d8 49 8b 9f 60 02 00 00 48 85 db 74 2c 0f 1f 40 00 48 8b 73 10 48 85 f6 0f 84 83 02 00 00 49 8b 87 f8 0c 00 00 31 d2 <48> 8b 78 18 ff 15 70 40 21 00 48 8b 1b 48 85 db 75 d8 49 8b 9f 98
I got following entries after I ran sudo journalctl -p err -b 0
:
https://gist.github.com/Smoukus/beb6099b940edad13ac08067257aec4d
The coredumps happened when I was playing Guild Wars 2 on Steam. the game hadn't crashes or suffered any issues, it's just that coredumps happened during that time.
I get constant fossilize_repla
coredumps like this. I'm not playing any game though.
Process 126010 (fossilize_repla) of user 1000 dumped core.
Stack trace of thread 126010:
#0 0x0000555d89b66f47 n/a (/home/user/.local/share/Steam/ubuntu12_64/fossilize_replay + 0x59f47)
#1 0x89d8db780000555d n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Is this expected?
I get coredumps of 1-9 fosslize_replay processes when launching Deep Rock Galactic. Sometimes, the "compiling shader" dialog get stuck and I have to manually skip/close it. Sometimes it continues, and sometimes fossilize_replay just doesn't crash and the shaders compile just fine. It doesn't seem to follow any pattern, really.
I'm running an Intel Arc A770 with latest mesa drivers.
EDT: This doesn't affect game performance, but weirdly enough sometimes the game hangs my graphics driver. I raised this issue with the mesa devs. As the hangs seem to be a regression in the driver and are not related to fosslize_replay's behaiviour, I think these issues are not related at the time of writting this.
kernel: fossilize_repla[35616]: segfault at 18 ip 000055d1650c7ade sp 00007fff1f7fdeb0 error 4 likely on CPU 7 (core 7, socket 0)
kernel: Code: 85 db 75 d8 49 8b 9f 60 02 00 00 48 85 db 74 2c 0f 1f 40 00 48 8b 73 10 48 85 f6 0f 84 83 02 00 00 49 8b 87 f8 0c 00 00 31 d2 <48> 8b 78 18 ff 15 f0 4b 21 00 48 8b 1b 48 85 db 75 d8 49 8b 9f 98
systemd-coredump[35765]: [🡕] Process 35616 (fossilize_repla) of user 1000 dumped core.
Stack trace of thread 35616:
#0 0x000055d1650c7ade n/a (/home/user/.local/share/Steam/ubuntu12_64/fossilize_replay + 0x69ade)
ELF object binary architecture: AMD x86-64
kernel: fossilize_repla[35788]: segfault at 18 ip 000055d1650b7f47 sp 00007fff1f7fddd0 error 4 likely on CPU 1 (core 1, socket 0)
kernel: Code: 39 a5 98 03 00 00 75 76 0f 1f 00 48 8b 44 24 08 49 8d 5c c5 00 48 8b b3 68 0c 00 00 48 85 f6 74 13 49 8b 85 f8 0c 00 00 31 d2 <48> 8b 78 18 ff 15 7f 47 22 00 48 c7 83 68 0c 00 00 00 00 00 00 49
systemd-coredump[35794]: [🡕] Process 35788 (fossilize_repla) of user 1000 dumped core.
Stack trace of thread 35788:
#0 0x000055d1650b7f47 n/a (/home/user/.local/share/Steam/ubuntu12_64/fossilize_replay + 0x59f47)
#1 0x0000000000000001 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
kernel: fossilize_repla[35792]: segfault at 18 ip 000055d1650c75ce sp 00007fff1f7fde70 error 4 in fossilize_replay[55d16505e000+240000] likely on CPU 4 (core 4, socket 0)
kernel: Code: 85 db 75 d8 49 8b 9f 60 02 00 00 48 85 db 74 2c 0f 1f 40 00 48 8b 73 10 48 85 f6 0f 84 83 02 00 00 49 8b 87 f8 0c 00 00 31 d2 <48> 8b 78 18 ff 15 00 51 21 00 48 8b 1b 48 85 db 75 d8 49 8b 9f 98
systemd-coredump[35807]: [🡕] Process 35792 (fossilize_repla) of user 1000 dumped core.
Stack trace of thread 35792:
#0 0x000055d1650c75ce n/a (/home/user/.local/share/Steam/ubuntu12_64/fossilize_replay + 0x695ce)
ELF object binary architecture: AMD x86-64
Just realized this issue is 4 years old..
I am also seeing this issue on Arch Linux:
[28741.985411] fossilize_repla[303073]: segfault at 1d00030309 ip 000072864503a78d sp 00005bcf26cf0eb8 error 4 in libc.so.6[728644fd7000+15b000] likely on CPU 5 (core 5, socket 0)
[28741.985449] Code: 83 f8 03 b8 00 00 04 00 48 0f 46 d0 31 c0 48 39 fa 0f 93 c0 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 48 8b 0c 25 10 00 00 00 <8b> 91 08 03 00 00 48 8d b9 08 03 00 00 89 d6 83 ce 02 39 d6 74 1d
[28748.762310] fossilize_repla[303075]: segfault at 55dd00000c35 ip 000072864503a78d sp 00005bcf26cf0eb8 error 4 in libc.so.6[728644fd7000+15b000] likely on CPU 11 (core 13, socket 0)
[28748.762348] Code: 83 f8 03 b8 00 00 04 00 48 0f 46 d0 31 c0 48 39 fa 0f 93 c0 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 48 8b 0c 25 10 00 00 00 <8b> 91 08 03 00 00 48 8d b9 08 03 00 00 89 d6 83 ce 02 39 d6 74 1d
[28750.877639] traps: fossilize_repla[303076] general protection fault ip:72864503a78d sp:5bcf26cf0eb8 error:0 in libc.so.6[728644fd7000+15b000]
FWIW I don't play many games, mostly Dota2 and Warframe. Steam sysinfo - https://gist.github.com/Strykar/07574caeaa8ecd0f3bfae5c077c3f876
Hello @Strykar, Driver: Mesa llvmpipe (LLVM 16.0.6, 256 bits)
in your system information tells us that Steam was forced to fallback to llvmpipe (mesa's faster CPU renderer) to run at all. This is an indicator that something's broken or incomplete with your video driver install. If you're using the NVIDIA proprietary driver and recently changed driver versions, the NVIDIA userspace libraries may not match the NVIDIA kernel module loaded into memory and the easiest way to clear that condition is to reboot.
Thanks @kisak-valve but that is no longer the case today (nvidia-utils
was a version behind yesterday).
In spite of all Nvidia binary drivers and packages being in order it still logs:
[ 4936.499799] fossilize_repla[54043]: segfault at 55dd00000c35 ip 0000777788c1f78d sp 00005e2bdde99bf8 error 4 in libc.so.6[777788bbc000+15b000] likely on CPU 5 (core 5, socket 0)
[ 4936.499826] Code: 83 f8 03 b8 00 00 04 00 48 0f 46 d0 31 c0 48 39 fa 0f 93 c0 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 48 8b 0c 25 10 00 00 00 <8b> 91 08 03 00 00 48 8d b9 08 03 00 00 89 d6 83 ce 02 39 d6 74 1d
[ 4936.504824] traps: fossilize_repla[54038] general protection fault ip:777788c1f78d sp:5e2bdde99bf8 error:0 in libc.so.6[777788bbc000+15b000]
[ 4936.750947] fossilize_repla[54034]: segfault at 17a00000482 ip 0000777788c1f78d sp 00005e2bdde99bf8 error 4 in libc.so.6[777788bbc000+15b000] likely on CPU 23 (core 13, socket 0)
[ 4936.750965] Code: 83 f8 03 b8 00 00 04 00 48 0f 46 d0 31 c0 48 39 fa 0f 93 c0 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 48 8b 0c 25 10 00 00 00 <8b> 91 08 03 00 00 48 8d b9 08 03 00 00 89 d6 83 ce 02 39 d6 74 1d
[ 4937.278831] fossilize_repla[54008]: segfault at 2800000328 ip 0000777788c1f78d sp 00005e2bdde99bf8 error 4 in libc.so.6[777788bbc000+15b000] likely on CPU 21 (core 11, socket 0)
[ 4937.278850] Code: 83 f8 03 b8 00 00 04 00 48 0f 46 d0 31 c0 48 39 fa 0f 93 c0 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 48 8b 0c 25 10 00 00 00 <8b> 91 08 03 00 00 48 8d b9 08 03 00 00 89 d6 83 ce 02 39 d6 74 1d
[ 4939.588249] fossilize_repla[54054]: segfault at c00040384 ip 0000777788c1f78d sp 00005e2bdde99bf8 error 4 in libc.so.6[777788bbc000+15b000] likely on CPU 23 (core 13, socket 0)
[ 4939.588269] Code: 83 f8 03 b8 00 00 04 00 48 0f 46 d0 31 c0 48 39 fa 0f 93 c0 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 64 48 8b 0c 25 10 00 00 00 <8b> 91 08 03 00 00 48 8d b9 08 03 00 00 89 d6 83 ce 02 39 d6 74 1d
Still logging Driver: Mesa llvmpipe (LLVM 16.0.6, 256 bits)
.
Steam sysinfo - https://gist.github.com/Strykar/f70308b945cce671ac6863e0b4e54076
I've lately seen this for the first time:
I'm not sure how to decode this, I'm also seeing a similar dmesg output for Electron apps sometimes. The coredump isn't very helpful either because debug info seems missing:
Error 4 probably means user-space was faulting for a non-existing page (
PF_USER
).I copied the 489830 cache to my local build and try to reproduce now with this command:
It resulted in the following log: https://gist.github.com/kakra/0272aa4ca003836750c18e687d6e1bf3
Retry with a debug build?