ValveSoftware / openvr

OpenVR SDK
http://steamvr.com
BSD 3-Clause "New" or "Revised" License
6.11k stars 1.28k forks source link

SteamVR Freezes #557

Open fhoenig opened 7 years ago

fhoenig commented 7 years ago

SteamVR freezes (in the presence of CUDA/OpenGL or CUDA-OpenGL interop) in the scene application.

Symptoms:

After our engine process has exiting:

I have a suspicion this could be a bug in SteamVR or the NVIDIA driver for Pascal GPUs. Does not happen on Maxwell or Kepler. We do insert a full device sync after all CUDA work is done and before any OpenGL works start and also before the WaitGetPoses(). No manual handoff call. The bug could also be in SteamVR.

KoenRijpstra commented 7 years ago

@fhoenig I have the exact same issue on my 1080ti. Sometimes it happens immediately sometimes it happens after a couple of minutes. I tried updating all drivers, software but to no avail. Did you find anything that maybe can help?

fhoenig commented 7 years ago

For what type of application and what API(s)? Are you using CUDA or compute shaders? Issue is not resolved yet but it all looks like some issue with Pascal and the fact that a Dx11 process (the steamvr compositor) is adding things into your GPUs commandbuffer.

KoenRijpstra commented 7 years ago

I am using Unity with the ZED camera. The ZED uses CUDA to compute a depth map in real-time. I am going to contact ZED to see if they know something about it. Thanks for your help!

fhoenig commented 7 years ago

Aha! Then it is most certainly a NVIDIA driver issue with Pascal. I'm in touch with NVIDIA and they'd open a ticket but in our case they could not yet reconstruct the issue themselves. Can you post the exact version of everything here? dxdiag output and the software versions you are using? I'll be able to forward it then.

On Mon, Aug 7, 2017 at 11:46 AM, Koen Rijpstra notifications@github.com wrote:

I am using Unity with the ZED camera. The ZED uses CUDA to compute a depth map in real-time. I am going to contact ZED to see if they know something about it. Thanks for your help!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ValveSoftware/openvr/issues/557#issuecomment-320619782, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWI6FyJlosIHDtQfYdPiIF_M7JCGT7Mks5sVt0BgaJpZM4N-ugS .

KoenRijpstra commented 7 years ago

Software used:

Unity 5.6.1f1
ZED SDK 2.1.2
ZED Unity plugin 2.1.2
SteamVR version 1499136050
SteamVR Unity plugin 1.2.2

dxdiag: DxDiag.txt

LoSealL commented 7 years ago
  • Our engine keep running normal without any errors, but at ~10fps
  • Our internal profiler timeline shows a very long wait on WaitGetPoses

This is because WaitGetPoses doesn't return, and SteamVR's internal timeout is 100ms. But the reason? I don't know... Maybe in the driver implementation the TrackedDevicePoseUpdated is blocked, or vrserver.exe crashes...

fhoenig commented 7 years ago

@Balderick - We are not using Unity. Our engine is written from scratch and using CUDA, just like Koen is indirectly as well. NVIDIA devtech is already aware of this problem but it seems like some sort of race condition or fence problem inside the driver. Therefore its really hard to reconstruct.

The combination of CUDA and SteamVR is super rare and I wouldn't hold it against the NVIDIA team to have a bug in some the interop code.

It happens only on Pascal, which has extended async compute capabilities. Perhaps its buried somewhere in that code.

F4r3n commented 7 years ago

I had that problem with a GTX 970 with a sample using the SDK OpenVR, CUDA-OpenGL interop. But it happens more often with 1060 or 1070.

fhoenig commented 7 years ago

Could someone post an executable which exhibits this freeze together with dxdiag output?

maximeLong commented 7 years ago

Just to add a wrinkle to this thread: I'm having the same issues with almost the same exact setup as @KoenRijpstra (Zed and Unity), but only with the Vive HMD. The Rift HMD does not exhibit this behavior at all.

@KoenRijpstra - do you have any word from Zed about this?

maximeLong commented 7 years ago

I'm going to necro this thread: Any word from Nvidia about this issue @fhoenig ?

fhoenig commented 7 years ago

negative. It'll need an executable the helps them reconstruct the bug.

On Sun, Sep 24, 2017 at 1:38 PM, maximeLong notifications@github.com wrote:

I'm going to necro this thread: Any word from Nvidia about this issue @fhoenig https://github.com/fhoenig ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ValveSoftware/openvr/issues/557#issuecomment-331738590, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWI6F_oXK4Vg_ZbeMSmly3azOwM3PQaks5slr23gaJpZM4N-ugS .

rosenrodt commented 7 years ago

+1 having this issue too.

Digging a little further, I found it is pretty easy to freeze SteamVR. Just do the following procedure. Your program does not even have to interact with SteamVR in any way. You just:

1) Run SteamVR with SteamVR Home actively rendering in Vive HMD 2) Fire up your CUDA program which upon initialization creates an OpenGL window as simple as

glfwInit();
glfwCreateWindow(800, 600, "myWindow", nullptr, nullptr);

And bang! SteamVR hangs just like OP said, right when the GL window is launched. If you remove the GL window in your program then both runs concurrently without any problem. It seems the combination of CUDA + OpenGL + SteamVR spells trouble.

Also freezes SteamVR is the combination of CUDA + DX11 + SteamVR in Unity (without OpenGL here because SteamVR plug-in supports only DX11 renderer).

P-yver commented 7 years ago

I am stuck too with this issue since a while. I was testing with an app (ZED SDK + OpenVR) which freeze from time to time. Then, after many tests, I came to the simplest code to reproduce the issue. Launch the _hellovropengl sample, the process uses about 8% of the CPU, the VRCompositor use 8% too, the total CPU usage is around 40%. Everything is fine, but as soon as I open a huge process such as Unity or photoshop (which increase a lot the CPU usage for a short time), the sample crash. Then I have tried in my own app, I disable many features to use the minimum CPU and it goes well till I use more CPU. It's like something in the rendering goes wrong if the CPU is too busy somewhere else.

edit : Running with the Vive connected (CPU i5-4460 / GPU :1060 6Gb)

fhoenig commented 7 years ago

Hmm. Sounds like actually just an OpenGL problem. Somewhat confirms what I see in our engine. If it crashes, the callstack is always on top of an OpenGL call.

Would you mind attaching a DXDIAG output?

fhoenig commented 7 years ago

@rosenrodt - does your test program also cause the issue if you don't have CUDA in there?

rosenrodt commented 7 years ago

I think this issue is specific to SteamVR Direct Mode. When I switch to Extended Mode my program runs just fine. Direct Mode is supposedly better though.

@P-yver I cannot reproduce your hellovr_opengl example. While running the example, I used 10+ CPU threads & 10 CUDA streams running while-loop doing trivial calculations on an array. The hellovr_opengl example - despite initial lag spikes - does not freeze at all like what many here have experienced.

@fhoenig My program cannot run without CUDA. Although I can't say for sure, I am under the impression that a few CUDA kernels won't break SteamVR. I have yet to pinpoint which specific call caused the issue because I am getting very inconsistent results from SteamVR.

Note: my workstation setup Win10 + i7 + GTX1060 SteamVR version 1504061330 SteamVR Unity plug-in 1.2.2 Geforce driver version 385.41

fhoenig commented 7 years ago

Definitely timing related. Something deadlocks and it's inter-process.

I could not reconstruct the hellovr_opengl thing either. In our engine, I temporarily added a glFinish() and cuCtxSynchronize() after waitgetposes() in the main thread. This seems to have "fixed" it for now but could also only be because of timings change.

fhoenig commented 7 years ago

@rosenrodt - Between your Unity application and the glfw test program you get the same exact SteamVR freeze? Meaning HMD image is frozen until SteamVR is restarted?

rosenrodt commented 7 years ago

@floenig Exactly. In Unity though D3D11 renderer is used instead. So it could very well be because of the timing issue or something goes wrong in the command buffer.

aleiby commented 7 years ago

I think this issue is specific to SteamVR Direct Mode. When I switch to Extended Mode my program runs just fine. Direct Mode is supposedly better though.

@rosenrodt If you run in Direct Mode, but disable Async Reprojection (in the desktop SteamVR Performance settings) does that avoid the issue as well?

fhoenig commented 7 years ago

@aleiby - At what date did Async Reprojection come out and what is it? Is it related to async compute?

fhoenig commented 7 years ago

@aleiby - New info. So the on my Quadro P6000 the same condition actually does something else than freeze SteamVR itself. It freezes inside arbitrary GL calls many stack levels deep into nvogl64.dll

Looks like disabling async reprojection in the steamvr settings solves the problem. At least in a 20min session it did not freeze. While with async repro ON it happens withing a couple of minutes.

rosenrodt commented 7 years ago

I can confirm turning off async reprojection partially works on my GTX1060. It can run 30 minutes instead of just a a few. If I turn off more computationally exhaustive GPU codes it can run for more than 1 hour

rosenrodt commented 7 years ago

@P-yver following your hellovr_sample issue reproduction example, I can reproduce the same SteamVR freeze

  1. Make sure async reproj on, interleaved reproj on, and you are in Direct Mode
  2. !Important! Initialize m_bvBlank as true; this sets SDL_GL_SetSwapInterval(true) and turns on Vsync
  3. Run as another independent process a few simple but arithmetically exhaustive CUDA kernels (mine is just a bunch of arrays doing arithmetic operations) to saturate the GPU
  4. After a few minutes the HMD freezes, exactly as the OP has reported

Interestingly, if I turn off async reproj it does not seem to crash, only drops framerate drastically. Same goes to when Vsync is off where it does not seem to cause problems

fhoenig commented 6 years ago

Okay, let's get back to this still unresolved issue.

A new application has entered the problem: HTC's official SDK for "see-through AR" using the Vive Pro's front facing cameras. They are using OpenCV and some CUDA implementations and all of their SRWorks examples crash when scanning the room mesh.

Image in the HMD is frozen until SteamVR is restarted.