alvr-org / ALVR

Stream VR games from your PC to your headset via Wi-Fi
MIT License
5.31k stars 477 forks source link

Linux: crash after creating CEncoder thread #729

Closed mittorn closed 3 years ago

mittorn commented 3 years ago
16:43:49.988434406 [WARN] Handshake: At alvr/server/src/connection.rs:121: At alvr/common/src/sockets/control_socket.rs:78
16:43:50.990929536 [INFO] #{"id":"ClientConnected"}#
16:43:50.991128120 [INFO] Serial Number: 1WMGH000XX0000
16:43:50.991178425 [INFO] Model Number: Miramar
16:43:50.991201969 [INFO] Render Target: 2688 1440
16:43:50.991223890 [INFO] Seconds from Vsync to Photons: 0.005000
16:43:50.991252294 [INFO] Refresh Rate: 60
16:43:50.991517453 [INFO] CEncoder::Run
16:43:50.991583658 [INFO] CEncoder Listening
16:43:51.580494452 [INFO] CEncoder client connected, pid 4533, cmdline /home/mittorn/.local/share/Steam/steamapps/common/SteamVR/bin/linux64/vrcompositor

We are initalizing Vulkan in CEncoder thread

Fossilize INFO: Overriding serialization path: "/home/mittorn/.local/share/Steam/steamapps/shadercache/250820/fozpipelinesv5/steamapprun_pipeline_cache".
[h264_vaapi @ 0x7fff486b3d80] Driver does not support some wanted packed headers (wanted 0xd, found 0x1).
vrserver: ../mesa-9999/src/amd/vulkan/radv_cmd_buffer.c:6475: radv_handle_image_transition: Assertion `src_family == cmd_buffer->queue_family_index || dst_family == cmd_buffer->queue_family_index' failed.

steamVR crashes after this line vulkaninfo: https://pastebin.com/QeUEWh43

mittorn commented 3 years ago

Any solutions? will it ever work? Tried different GPU and driver version, still getting same issue. If i disable this assert, getting gray square instead of image

ckiee commented 3 years ago

@mittorn The person who wrote this code (@xytovl) was using the radv for development too, so maybe this is a problem specific to your GPU? Either way, it sounds like a problem somewhere down the stack.

mittorn commented 3 years ago

I tried on rx580 (polaris10) and vega 11 (raven), same issue, stable mesa 20.1 ang git version. Yesterday found that software implementation works (tried commenting out vaapi pipeline), but very slow, it's strange because using shared texture receiving code

mittorn commented 3 years ago

More information for CEncoder crash

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff7431423 in __GI_abort () at abort.c:79
#2  0x00007ffff7431300 in __assert_fail_base (fmt=0x7ffff76185a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x7fff6c8c7b08 "src_family == cmd_buffer->queue_family_index || dst_family == cmd_buffer->queue_family_index",
    file=0x7fff6c8c7288 "../mesa-9999/src/amd/vulkan/radv_cmd_buffer.c", line=6934, function=<optimized out>) at assert.c:92
#3  0x00007ffff74429c2 in __GI___assert_fail (
    assertion=0x7fff6c8c7b08 "src_family == cmd_buffer->queue_family_index || dst_family == cmd_buffer->queue_family_index",
    file=0x7fff6c8c7288 "../mesa-9999/src/amd/vulkan/radv_cmd_buffer.c", line=6934,
    function=0x7fff6c8c8c90 <__PRETTY_FUNCTION__.96952> "radv_handle_image_transition") at assert.c:101
#4  0x00007fff6c3c9182 in radv_handle_image_transition (cmd_buffer=0x7fff58212020, image=0x7fff580ac0c0,
    src_layout=VK_IMAGE_LAYOUT_UNDEFINED, src_render_loop=false, dst_layout=VK_IMAGE_LAYOUT_GENERAL, dst_render_loop=false,
    src_family=4294967295, dst_family=4294967294, range=0x7fff6e3f4ae0, sample_locs=0x0)
    at ../mesa-9999/src/amd/vulkan/radv_cmd_buffer.c:6934
#5  0x00007fff6c3c9919 in radv_barrier (cmd_buffer=0x7fff58212020, memoryBarrierCount=0, pMemoryBarriers=0x0, bufferMemoryBarrierCount=0,
    pBufferMemoryBarriers=0x0, imageMemoryBarrierCount=1, pImageMemoryBarriers=0x7fff6e3f4ab0, info=0x7fff6e3f49e0)
    at ../mesa-9999/src/amd/vulkan/radv_cmd_buffer.c:7050
#6  0x00007fff6c3c9a3a in radv_CmdPipelineBarrier (commandBuffer=0x7fff58212020, srcStageMask=1, destStageMask=4096, byRegion=0,
    memoryBarrierCount=0, pMemoryBarriers=0x0, bufferMemoryBarrierCount=0, pBufferMemoryBarriers=0x0, imageMemoryBarrierCount=1,
    pImageMemoryBarriers=0x7fff6e3f4ab0) at ../mesa-9999/src/amd/vulkan/radv_cmd_buffer.c:7088
#7  0x00007fff6f1aaa21 in prepare_frame.lto_priv () from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/libavutil.so.56
#8  0x00007fff6f1a2358 in vulkan_map_to_drm.isra () from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/libavutil.so.56
#9  0x00007fff6f1a6856 in vulkan_map_from.lto_priv () from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/libavutil.so.56
#10 0x00007fff6f1b065c in av_hwframe_map () from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/libavutil.so.56
--Type <RET> for more, q to quit, c to continue without paging--
#11 0x00007fffb18df579 in alvr::EncodePipelineVAAPI::EncodePipelineVAAPI(std::vector<alvr::VkFrame, std::allocator<alvr::VkFrame> >&, alvr::VkFrameCtx&) () from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/bin/linux64/driver_alvr_server.so
#12 0x00007fffb18df98c in alvr::EncodePipeline::Create(std::vector<alvr::VkFrame, std::allocator<alvr::VkFrame> >&, alvr::VkFrameCtx&) ()
   from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/bin/linux64/driver_alvr_server.so
#13 0x00007fffb18e0db6 in CEncoder::Run() () from /home/mittorn/ALVR/build/alvr_server_linux/lib64/alvr/bin/linux64/driver_alvr_server.so
#14 0x00007ffff7755c14 in std::execute_native_thread_routine (__p=0x7fff8c010440)
    at /var/tmp/portage/sys-devel/gcc-11.1.0-r1/work/gcc-11.1.0/libstdc++-v3/src/c++11/thread.cc:82
#15 0x00007ffff7655b91 in start_thread (arg=0x7fff6e3f6640) at pthread_create.c:473
#16 0x00007ffff757a76f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

software path does not map frames hardware encoding from KMS planes works for me, so it is not VAAPI issue

leder11011 commented 3 years ago

Don't know if it's related: I have kernel crash on Ubuntu 21.04 hirsute. I am using the alvr-nightly repo build for alvr_launcher and quest 1 client APK.

The bug report on discord was as follows:

retest and crash w/ linux kernel: leder@home-ryzen:~$ uname -r 5.13.2-051302-generic

ckiee commented 3 years ago

@leder11011 Can you get some kernel logs? I think there is an option to get the kernel to kexec a recovery kernel so you can inspect memory.

leder11011 commented 3 years ago

No, I cannot debug the kernel.

------ Originalnachricht ------ Von: "ckie" @.> An: "alvr-org/ALVR" @.> Cc: "leder11011" @.>; "Mention" @.> Gesendet: 26.07.2021 01:49:22 Betreff: Re: [alvr-org/ALVR] Linux: crash after creating CEncoder thread (#729)

@.*** https://github.com/leder11011 Can you get some kernel logs? I think there is an option to get the kernel to kexec a recovery kernel so you can inspect memory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alvr-org/ALVR/issues/729#issuecomment-886277607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNARAA5BNOJXT4NVWDW5TTZSPIFANCNFSM5AEIEK4A.

ckiee commented 3 years ago

Then you are quite stuck it seems.

mittorn commented 3 years ago

@leder11011 which GPU and mesa version do you have? I updated to git mesa and GPU does not crash now on polaris10. Do not even try on zen1 APU, it is completely broken

leder11011 commented 3 years ago

I did not touch Mesa on a fresh Ubuntu hirsute install. On LTS version I compiled Mesa git and we debugged until a strange Vulkan error appeared - VR compositor crash.

mittorn @.***> schrieb am Mo., 26. Juli 2021, 23:45:

@leder11011 https://github.com/leder11011 which GPU and mesa version do you have? I updated to git mesa and GPU does not crash now on polaris10. Do not even try on zen1 APU, it is completely broken

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alvr-org/ALVR/issues/729#issuecomment-887048535, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNAREOMNICJC43G5QHTPLTZXJOHANCNFSM5AEIEK4A .

leder11011 commented 3 years ago

I have built and installed mesa git version:

leder@home-ryzen:~$ glxinfo |grep OpenGL
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon RX 5700 XT (NAVI10, DRM 3.41.0, 5.13.2-051302-generic, LLVM 11.0.1)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 21.3.0-devel (git-cac5711d43)
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6 (Compatibility Profile) Mesa 21.3.0-devel (git-cac5711d43)
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.3.0-devel (git-cac5711d43)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:
leder11011 commented 3 years ago

I have used nightly binary for ubuntu from here: https://github.com/alvr-org/ALVR-nightly

and cannot execute self compiled binary at all... do you use ubuntu, too?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.