ValveSoftware / openvr

OpenVR SDK
http://steamvr.com
BSD 3-Clause "New" or "Revised" License
6.14k stars 1.28k forks source link

What is the CORRECT expected fPredictedSecondsFromNow? #518

Open echuber2 opened 7 years ago

echuber2 commented 7 years ago

Hi, I am extremely confused about how WaitGetPoses is making pose predictions internally based on conflicting statements from Valve staff and documentation. In particular, I'm wondering what the correct formula is for calculating a value that could be passed to GetDeviceToAbsoluteTrackingPose to get the identical prediction that WaitGetPoses would make anyway.

In this thread: http://steamcommunity.com/app/358720/discussions/0/351659808493203424/ @aaronleiby suggests a formula that differs from the equation given in: https://github.com/ValveSoftware/openvr/wiki/IVRSystem::GetDeviceToAbsoluteTrackingPose by the addition of an extra FrameDuration measure, which would make for a roughly 11ms discrepancy.

In addition to this, what is the expected typical value, considering Valve's pipeline based on the GDC talks would give about 22ms prediction as typical? Using the formula above without the extra FrameDuration added, experiments in HelloVR show that the value calculated is about 14ms. If we imagine adding in an extra 11ms to this, we get closer to the 22ms figure that the GDC talks would suggest. What is really going on here?

I would really appreciate some concrete information about this, as I have a wealth of experimental data on user comfort from controlled trials now, but I can't make a sound judgment from the data without knowing how the API is really treating a nonzero prediction value passed to GetDeviceTATP. Aaron Leiby's comments make it sound like a hidden 11ms is baked in no matter what value is passed. @JoeLudwig Thank you both for many helpful details in the past.

aleiby commented 7 years ago

The documentation for GetDeviceToAbsoluteTrackingPose assumes that GetTimeSinceLastVsync is called during the frame render period. I think it was likely written before we added the concept of "running start".

Alex gave a presentation at GDC a while back on that topic, which you can view here: http://www.gdcvault.com/play/1021771/Advanced-VR

The 11ms discrepancy comes in based on whether you call GetTimeSinceLastVsync before or after the vsync event that starts the frame (as it will return different values).

If you call GetTimeSinceLastVsync during running start, you will get the time for the previous frame's starting vsync and need to add an additional frame in order to target the proper time that the backlight of the display(s) turns on.

It's sometimes easier to think about this by working backward from that point. For the Vive which is a global illuminated display, it takes effectively a full frame (11ms) to scan out the image (i.e. transfer across HDMI and load into the screen). Global means that the entire display gets loaded before the backlight turns on. The backlight is only on for 1-2ms (that's the low persistence part). Scanout time is reported by the hmd using the vr::Prop_SecondsFromVsyncToPhotons_Float property.

Then we allow a full frame interval for rendering (another 11ms). This is reported by the hmd using the vr::Prop_DisplayFrequency_Float property (which you then need to take the inverse of to convert from a frequency to a duration, e.g. 1/90 = 11.1111ms).

This interval is where D3D Present gets called. That queues up the frame being rendered to start scanning out at the next vsync event.

Then to figure out where in that interval you currently are, you use GetTimeSinceLastVsync.

If you are calculating this during Running Start, then you need to remember that you are in the previous frame's interval still, which requires adding an additional frameDuration to get you to the frame interval you care about.

With a 3ms running start, we typically predict out 25ms (11ms for rendering, 11ms for scanout, 3ms for running start). One thing I've found is that consistency between frames is important. It's better to predict a consistent amount further out, than to try to wait a variable amount later in the frame (e.g. if the game isn't taking the entire frame time to render).

The above descriptions don't account for async or interleaved reprojection. With interleaved, we give the app two full frames to render, but lose running start, so prediction times there jump out to 33ms, and then get presented twice (the second time using a pose predicted out 22ms).

With async, we pull back running start a couple ms to give the high priority render context some slack for interrupting existing gpu work, however, we also always apply reprojection to every frame, so the app will render using poses predicted 27ms or so, but the compositor will reproject them using poses predicted closer to 16ms.

echuber2 commented 7 years ago

Thank you very much for the detailed reply! As I gather then, I would need to manually add 11ms to the calculated prediction time duration, depending on whether I am doing the calculations before or after the system crosses the running start threshold? I'm not sure how exactly to use the GPU profiler to tell where my call is being made with respect to that. Our demo software is light and runs solidly at 90fps with no dropped frames, so I had been naively assuming the reprojection wouldn't make a difference for us. However if the reprojection mode changes the value that GetTimeSinceLastVsync reports, it would seem there are several more variables to nail down?

Overall I am not sure which settings would give me the purest ability to specify the complete ms of prediction that are definitely applied. I would also really like to be able to determine an exact offset I can apply to the data already collected to make the value accurate, if the prediction setting I've been entering all along was offset by a hidden amount. Now I'm a bit worried if this is even possible to discern based on profiling our testing setup or if a large margin of error needs to be applied.

aaronleiby commented 7 years ago

need to manually add 11ms to the calculated prediction time duration, depending on whether I am doing the calculations before or after the system crosses the running start threshold?

It would be based on whether you are calling GetTimeSinceLastVsync before or after the starting vsync for that frame (not running start). You can't guarantee when the vsync timing info will be updated, however, which is why a frame counter is also provided in that function to identify when it changes from measuring one frame to the next.

IVRCompositor::GetFrameTimingRemaining might be a more reliable way for identifying when the current frame is intended for. It is updated before WaitGetPoses returns, so it returns the frame's end vsync timing from the beginning of that frame's running start until the start of the next frame's running start. It reports vsync timing, so you'd need to factor in Prop_SecondsFromVsyncToPhotons_Float to get to the illumination period for that frame.

You may also want to dig into IVRCompositor::GetFrameTiming(s) which provide several stats to help determine when frames are presented, etc. If you already have a bunch of collected data without this additional info, then it might not be possible to work out what you are trying to determine.

More info here: https://github.com/ValveSoftware/openvr/blob/master/headers/openvr.h#L2101

chipperjones10atl commented 7 years ago

Hi~ I have some questions about interleaved reprojection: "we give the app two full frames to render, but lose running start, so prediction times there jump out to 33ms, and then get presented twice (the second time using a pose predicted out 22ms)"

  1. Why don't we consider about "running start"?
  2. Does it mean reproject frame (22ms) comes after the frame app submits (33ms)?
aleiby commented 7 years ago

It's a little bit more complicated than I described. On AMD hardware we actually do retain something similar to running start when (non-async) interleaved reprojection is enabled. On Nvidia hardware we should have an API to gain that back as well, but I'm not sure when I will get a chance to implement that.

When interleaved reprojection is active, the poses we provide to the application are predicted out 33ms (sampled immediately after the vsync that starts that frame). We then take those images (left and right) and present them twice. Both times we present these images use reprojection with new poses predicted the normal 25ms out at the beginning of running start (3ms before vsync).

echuber2 commented 7 years ago

What I found is that roughly, supplying 14ms as the argument to GetDeviceToAbsoluteTrackingPose would produce prediction that "looks" correct. If I add 11ms to this I arrive at the 25ms that would seem to be expected based on the above discussion. I'm not sure if async was enabled or not. You mentioned that the compositor will reproject with roughly 16ms if async is enabled. I am wondering if it's correct to assume that by punching in 14ms to GDTATP, there was a hidden 11ms involved in the pipeline that effectively made the prediction amount 25ms instead, or if we are directly specifying this value as the actual, absolute prediction amount, which would suggest that we were guiding the value used by async for the update stage? Thank you for the continued clarifications. @aleiby

echuber2 commented 7 years ago

It looks like WaitGetPoses is called at "-10.5" ms roughly on the Frame Timing meter, with "New Poses Ready" around -3 or -4. Total GPU finishes up at +2 and Application Interval around +4.5 (not sure what this is).

When Async is enabled, there's the "Compositor Update" around +7.5. I'm not sure if that is associated with the reprojection you mentioned "closer to 16ms". Is that 16ms of prediction passed to GDTATP, called at this +7.5ms mark, or roughly 16ms (18ms) into processing the frame? When async is on, what controls how much prediction is applied for the late update? I just tried toggling async on and off now, and with our manual override of GDTATP, having async on didn't seem to defeat the customized prediction amount. But then if async is somehow incorporating the last value passed to GDTATP when it does the update, I'm curious about how.

cjwidd commented 7 years ago

@aleiby thank you so much for your responses, but the issue echuber2 has raised above is also something I am struggling to nail down. Could you perhaps elaborate once more?

aleiby commented 7 years ago

There's some documentation on SteamVR Frame Timing here: https://developer.valvesoftware.com/wiki/SteamVR/Frame_Timing

WaitGetPoses is when the application calls the function (as opposed to when it returns). If the application is cpu bound, then sometimes you can see this eating into the running start portion of the frame (which is bad).

NewPosesReady is effectively the beginning of running start. This should be pretty steadily at around -3ms (the relative timings are based on the vsync that starts the frame) in non-async mode. In async, it gets pushed back to -4ms when async-mode is active, but can get temporarily pushed back even further if there is hitching on the gpu.

TotalGPU is the time it takes on the gpu to render the frame. This the sum of both the application and compositor render work. If there are bubbles (idle gpu time) in the application work, those are included in the timing. We rely on this value to drive adaptive quality in The Lab (http://www.gdcvault.com/play/1023522/Advanced-VR-Rendering).

ApplicationInterval is the time between calls to WaitGetPoses. If this drifts above the frame interval (11.111ms) then it will start eating into your running start time, and eventually cause you to drop a frame. This generally indicates that the application is gpu bound.

CompositorUpdateEnd is when the compositor finishes submitting its gpu work for the frame. In async-mode it waits until running start (~11.1ms minus 4ms = ~7.1ms), samples poses, signals the app, and submits its work which generally gets us to around 7.5ms into the frame. Most of that time is blocking on D3D's Flush/Present.

If you want to get a better idea of when poses are sampled, you can use gpuview: https://developer.valvesoftware.com/wiki/SteamVR/Installing_GPUView

In async-mode, starting recording will trigger the hitching mentioned above, so you want to sample for at least two seconds to let things settle back down, and then ignore those first couple seconds when looking at the data. The SteamVR events are all under the GUID that starts with 8C8 (usually filters to the top). https://imgur.com/a/IaawK

echuber2 commented 7 years ago

Thanks a lot! I'll need to try GPUView and see what I can discern from it. I have a few lingering questions. For reference, here's a shot of our Frame Timing window when only Interleaved reprojection is enabled. http://imgur.com/a/ovUNr

-What does it mean for application interval to be so low, at 4.5ms? If WGP is being called so often (222 Hz) then I guess a lot of unnecessary frames are being computed.

-When I call GDTATP manually after WGP returns, does that call happen around the 0ms mark on the timing chart, or closer to the -11 mark? How do I know when WGP returns?

-If an update reprojection happens in async mode, what prediction interval is used for the update? Is this applied retroactively to the original poses in such a way that it would undo our manual call to GDTATP? (Apparently not, by my eye. Then I guess if it updates the view, it updates what we've already adjusted, using some hardcoded amount we can't alter.)

-Between comments by Aaron Leiby, Joe Ludwig, and Alex Vlachos, it seems that about 25ms absolute prediction amount could be expected under normal circumstances in interleaved mode. As far as I know, if I manually make a call to GDTATP after WGP returns, and specify an offset of 14ms (as determined by the GDTATP formula and seemingly correct by observation), this will essentially produce the 25ms prediction amount, so there is always a "hidden" 11ms offset to the specified prediction under full 90FPS operation in our software. Am I probably correct about this assumption?

Sorry for all the continued misunderstandings on my part...

echuber2 commented 7 years ago

Another thing I forgot to ask about is the recurring references to D3D, supposing we're using OpenGL. I guess the fact of the matter is that the compositor itself uses D3D, regardless.

aaronleiby commented 7 years ago

Yes, the compositor uses DX11 on Windows. Anywhere I say D3D, it can just be read as "submit to the graphics driver".

echuber2 commented 7 years ago

I see, thank you. Sorry for the double post -- Do you have any comment on the items I mentioned in the preceding comment?

aleiby commented 7 years ago

Ah, I didn't get an email notification for the previous post so I had missed it.

What does it mean for application interval to be so low, at 4.5ms?

Application Interval measures the time between application calls to WaitGetPoses. However, WaitGetPoses itself will block until the next running start. A low value just indicates you have an efficient renderer. The ideal situation is where an application has a separate thread for feeding the gpu and can queue up all their work in a deferred render context or something similar, such that when WaitGetPoses returns, they can push those poses into a constant buffer, then immediately submit all their work for the frame, calls Submit with left and right render targets, then block on WaitGetPoses again until the next frame (which the Main thread continues running game logic and then queuing up render work for the next frame for the render thread).

How do I know when WGP returns?

On the Frame Timing graph, it's the New Poses Ready line in black.

If an update reprojection happens in async mode, what prediction interval is used for the update?

I'm not sure exactly what you're asking here, but in async mode, running start is pushed back to 4ms before vsync (most of the time, but sometimes longer), so the poses the application gets to render the scene use about 26ms prediction. However, we also always apply reprojection with poses predicted about 15ms out when displaying them in async-mode. If we think that the application isn't going to make framerate, then we start predicting out an extra frame (~37ms), but then apply correction using the ~15ms predicted pose before each vsync that it will be scanned out for.

Is this applied retroactively to the original poses in such a way that it would undo our manual call to GDTATP?

Reprojection corrections are applied based on the poses returned by WaitGetPoses. We assume you render the frames passed to Submit using the poses returned by the previous WaitGetPoses. Rendering using other poses will result in incorrect behavior. Calling GetDeviceToAbsoluteTrackingPose does not have any effect on this -- except potentially giving you different poses that you might decide you want to render with (which would likely result in incorrect output). We don't have any interface to allow the application to specify which poses were used to render the scene textures passed to Submit, and instead always assume the values returned by WaitGetPoses were used.

it seems that about 25ms absolute prediction amount could be expected under normal circumstances in interleaved mode

I would clarify this as non-async mode as opposed to interleaved mode. The two modes are orthogonal. Async means the compositor can interrupt application rendering if it goes long to (re)present the last application frame (rather than dropping a frame entirely). Interleaved is a predictive system which proactively (or reactively if it fails to predict) drops the application to half-framerate until it gets its rendering back under budget (e.g. 90fps).

if I manually make a call to GDTATP [...] there is always a "hidden" 11ms offset to the specified prediction

This is incorrect. If WaitGetPoses returns properly at 3ms before before vsync (assuming non-async mode and interleaved reprojection is not currently active), then calling GetDeviceToAbsoluteTrackingPose with 25ms should get you very close to the same values returned by WaitGetPoses. The compositor uses GetDeviceToAbsoluteTrackingPose internally to generate the poses returned by WaitGetPoses. The values you see in the gpuview events (e.g. the 25.2167 shown here https://imgur.com/a/IaawK) are the fPredictedSecondsToPhotonsFromNow that the compositor is passing into GetDeviceToAbsoluteTrackingPose.

echuber2 commented 7 years ago

Thanks, I think I'm getting to the bottom of it now:

If WaitGetPoses returns properly at 3ms before before vsync (assuming non-async mode and interleaved reprojection is not currently active), then calling GetDeviceToAbsoluteTrackingPose with 25ms should get you very close to the same values returned by WaitGetPoses.

~So then it would seem our software is calling GDTATP soon enough that it is catching the running start, requiring us to add 11ms to the prediction amount to replicate the proper value. It seems I could automate the +11 adjustment in our calculation by checking the frame index before calling WGP and again after WGP returns, just before caling GDTATP manually; however I'm not sure if I can reliably get the immediate frame index this way if I use IVRCompositor::GetFrameTiming. From what you've said, GetFrameTimeRemaining should be a superior data source, but I found a few open issues on here relating to its return value. I am pondering whether it would make sense to use another timer in the same thread to determine how much time has passed between WGP returning and the hacky GDTATP call being made.~

I'm pretty sure about that part now...

However, we also always apply reprojection with poses predicted about 15ms out when displaying them in async-mode.

So it's true that the async mode reprojection will use a hardcoded (or inaccesible) value of ~15ms, and this will be applied to original WGP poses, not the hacked ones we get with an extra GDTATP call? But then I'd expect to see async mode undoing our alterations to the prediction entirely, but from what I can tell it doesn't make a difference; we can still mess with prediction amount in async mode. Is it the case that when 90fps is solidly being achieved, even in async mode no extra reprojection happens? (Clearly, for the sake of our experiment async should be turned off regardless. My concern is that I'm not actually sure if it was enabled or not when some of our data was collected.) I'd like to believe that at 90fps, the different reprojection options basically make no difference, as this is how it appears to be in the HMD, but clearly the frame timing window shows a discrepancy with async enabled.

Thank you so much for continuing to provide support on this issue!

echuber2 commented 7 years ago

I edited the above comment. @aleiby do you have any information about the last part? That is, you seem to be saying that when async is enabled, some automatic adjustments to the prediction will always be made late-stage. However, it doesn't look that way when I try it. (Also, I'm hoping it's not the case.) From what I can tell, we can still make manual adjustments when async is enabled, suggesting that either A or B:

aleiby commented 7 years ago

In all cases we are predicting to a fixed point in the future - specifically the (mid)point where the displays light up. The only variable is when you are asking for that particular point. If the tracking system was perfect (i.e. zero error) and 100% prescient, then it would always return the exact same pose. The difference in poses returned will therefore be a function of how much time between when you ask for the pose (the amount of prediction used) times the error in the tracking system. We only use velocity and angularVelocity for predicting poses because higher order data (e.g. acceleration) tends to be too noisy. For a headset at rest, there should be effectively zero error. For a headset moving linearly, there should be near zero error. The only time you should start seeing discrepancies are when the headset is changing directions or speeding up / slowing down between the time the pose is asked for and the screen actually lights up. For the extents of human head motion, this is where we've determined around 20ms is the upper limit before you start seeing too much delta.

aleiby commented 7 years ago

Also, it's very hard to see poses being off by just a couple milliseconds - especially if the entire world is updating using the same poses. One trick we use for ensuring the application is using the proper poses is to bring up the chaperone grid. With this as a fixed reference, it's then easier to see any wiggle between the movement of the scene and the grid. Another trick is to hold a controller in physical contact with the headset and then spin around. The controller (or hand model) should stay locked in place in your view.

Usually, it's only possible to notice a full frame off - i.e. when you wind up predicting to the wrong vsync entirely.

echuber2 commented 7 years ago

Thanks for the update. Well, what I'm asking is more specifically about this edge case: when async reprojection is enabled and we are doing manual GDTATP calls after WGP and before rendering, what exactly is async doing in that case. You mentioned that it's using the delta between the original WGP poses and the async poses to calculate a texture reprojection, so I guess this adjustment could be applied on top of our manipulated and submitted poses. Then if the adjustment is subtle, we may not be able to tell the difference from when async is on or off, even when we're messing with the prediction amount.

Let me clarify about our project -- we have a good understanding of the theory behind prediction, but we have been explicitly manipulating the prediction amount with our own calls to GDTATP in order to test what you just described: how readily users can detect the change in prediction. That's why we've been doing this, although as you mentioned above it will result in incorrect behavior. (I wish that async had done a no-op when framerate is at 90FPS, so that our incorrect behavior would be consistently incorrect. As it is, I fear we may need to redo the trials while ensuring that async is turned off.)

aleiby commented 7 years ago

what I'm asking is more specifically about this edge case: when async reprojection is enabled and we are doing manual GDTATP calls after WGP and before rendering, what exactly is async doing in that case. You mentioned that it's using the delta between the original WGP poses and the async poses to calculate a texture reprojection, so I guess this adjustment could be applied on top of our manipulated and submitted poses.

We don't have any interface for applications to inform the runtime which poses were actually used render the scene, so it can only act on the assumption that the poses returned by WGP were used. Any deviation from that will show up in the final presented images.

TheWhiteAmbit commented 7 years ago

It would be great, if one cloud tell the API which prediction was used on submit of frames, as in the libovr API for oculus. This has caused me days of headache also.

aleiby commented 7 years ago

The following changes will be in the next SDK update.

/** Allows specifying pose used to render provided scene texture (if different from value returned by WaitGetPoses). */
struct VRTextureWithPose_t : public Texture_t
{
       HmdMatrix34_t mDeviceToAbsoluteTracking; // Actual pose used to render scene textures.
};

       // Set to indicate that pTexture is a pointer to a VRTextureWithPose_t.
       Submit_TextureWithPose = 0x08,

This is currently only supported by the Beta branch of SteamVR, but will fail gracefully in the default branch in the meantime.

echuber2 commented 7 years ago

@aleiby I take it this would allow custom prediction algorithms to work alongside either or both of the reprojection modes? I'd like to thank you again for all your advice and support on this issue.

Balderick commented 7 years ago

echuber2 i think you just hit the nail smack on the head for bad postioning, bad rotation, juddering, stuttering and black rendered images when using non steamvr tracked devices in steamvr!

There is nothing in openvr api to handle differing tracking predictions but steamvr allows other algorithmsd to be loaded. If the other tracking algorithms do agree with steamvr coords and structs then then things just work. The steamvr driver for hydra is one example of an openvr driver for a non steamvr tracked device just working as expected in steamvr because of this. i.e. its coords and structures used to create a pose agrees with what steamvr tracked devices use.

I think openvr drivers trying to provide support for an array of display, tracker and other vr components through one openvr driver simply do not work well because steamvr is needing things like lense and display info specified in hmd openvr drivers for things like distortion and prediction to work as expecetd.

TheWhiteAmbit commented 7 years ago

https://github.com/ValveSoftware/openvr/issues/72

TheWhiteAmbit commented 7 years ago

Getting sick of mentioning this. I tweaked a way around 98%, but in the end this is just a lack of information provided from/to OpenVR API. With Oculus API you simply submit the pose where the frame was calculated. Timewarp calculates the difference, problem solved! Great to hear there is a chance for a cure soon, @aleiby

aaronleiby commented 7 years ago

@TheWhiteAmbit, yeah this has been in the SDK for a little while now: https://github.com/ValveSoftware/openvr/blob/master/headers/openvr.h#L396

TheWhiteAmbit commented 7 years ago

Thank you @aaronleiby, will implement it when it is in the official API now :) Didn't expect it to be in 1.0.10 already and there was no newer release since. Hope it is working already and will not "fail gracefully" as mentioned by @aleiby (comparing the names I guess this is the same you a[aron]leiby)

TheWhiteAmbit commented 7 years ago

Wow :) having all the Data already on hand from our Rendering-Kernel preserving analog structures from LibOVR it took me just 15 Minutes to implement it - including download and update to the latest OpenVR SDK. And it works like charm! We can stop external rendering of new Frames while keeping our Kernel running. So I could easily test it and had a timewarped frame floating at the last transmitted position. Thank you @aaronleiby