NASA-AMMOS / 3DTilesRendererJS

Renderer for 3D Tiles in Javascript using three.js
https://nasa-ammos.github.io/3DTilesRendererJS/example/bundle/mars.html
Apache License 2.0
1.54k stars 276 forks source link

Immersive VR tile loading (WebXR) #213

Closed phoenixbf closed 2 years ago

phoenixbf commented 3 years ago

3D Tiles progressive loading in the VR example seem not to trigger during exploration using WebXR-enabled browsers (e.g. built-in Oculus Quest browser or mobile Chrome). While it works fine on Firefox Reality (Oculus Quest) browser

gkjohnson commented 3 years ago

Can you provide more information on the issue? If you're able to provide a fix I'd be happy to take a PR. I can only test with Firefox Reality at the moment and don't have the bandwidth to debug the example more deeply. Thanks for the report!

phoenixbf commented 2 years ago

The issue should be replicable with the example: https://nasa-ammos.github.io/3DTilesRendererJS/example/bundle/vr.html

Basically once the VR session starts, tiles do not switch (at all) according to camera location (tested on Oculus Browser, WebXR on android Chrome mobile) - while Firefox Reality seems to correctly manage the virtual camera, thus loading children as we move around. I also report that it's working fine using WebXR API emulator (Chrome, windows 10) - quite strange.

It is also unclear to me in the example why all tileset cameras are cleared and reset on each render (line 240): shouldnt be sufficient to set/replace it once, when the VR presentation starts?

So far I tried different ways to (re)set virtual camera in XR sessions, but no luck. I'm testing several options, once I find a solution I'll post here

gkjohnson commented 2 years ago

Basically once the VR session starts, tiles do not switch (at all) according to camera location (tested on Oculus Browser, WebXR on android Chrome mobile) - while Firefox Reality seems to correctly manage the virtual camera, thus loading children as we move around.

You can log transform of the VR camera (used here) and if it doesn't update as you move your head then that seems like the core of the issue. It's possible this could be a three.js issue if not a browser issue, then. I had also tested with desktop Chrome and the Chrome emulator previously without issue, as well. Just not other mobile browsers. You may want to try with the latest three.js, as well.

It is also unclear to me in the example why all tileset cameras are cleared and reset on each render (line 240): shouldnt be sufficient to set/replace it once, when the VR presentation starts?

It would be sufficient to do that -- removing all cameras and then resetting with the ones that are needed for that frame was the brute force approach I added in the moment. It looks like there's and undocumented sessionstart even on renderer.xr that could be listened for if that's an improvement you'd like to make.

There are actually some updates to the way the camera should be managed with WebXR in newer versions of three.js prompted by some of the issues I ran into here, as well, but they haven't been integrated into the example, yet. See https://github.com/mrdoob/three.js/pull/21886.

So far I tried different ways to (re)set virtual camera in XR sessions, but no luck. I'm testing several options, once I find a solution I'll post here

Thanks! I appreciate you looking into this.

phoenixbf commented 2 years ago

So far I found out the VR camera transform looks fine. It seems though the tile download (the first request to a new tile) is not working/triggered correctly on chrome-based WebXR session. The issue presists also with latest THREE.js version.

This can be replicated by exploring a bit a tileset before entering a VR session: tiles already requested are switching fine when approached, but no new tiles are being downloaded.

gkjohnson commented 2 years ago

This can be replicated by exploring a bit a tileset before entering a VR session: tiles already requested are switching fine when approached, but no new tiles are being downloaded.

Interesting so it sounds like tiles are updating, at least, but after they've already been downloaded? But the downloads don't trigger while VR is enabled?

chrome-based WebXR session.

I assume you mean a chrome based WebXR session on a mobile device, right? I wasn't seeing this issue when testing on Desktop Chrome.

The things I'd want to verify are the following:

I'm wondering if the devices you're using throttle downloads or something while in XR?

phoenixbf commented 2 years ago

Interesting so it sounds like tiles are updating, at least, but after they've already been downloaded? But the downloads don't trigger while VR is enabled?

Correct.

I assume you mean a chrome based WebXR session on a mobile device, right? I wasn't seeing this issue when testing on Desktop Chrome.

I'm currently testing on mobile WebXR (Android Chrome) and official Oculus browser (Oculus Quest 2) - both scenarios present the same issue.

The things I'd want to verify are the following:

  • That the xr getCamera call is returning an array camera as expected.
  • That the left camera exists and that the viewport.z and .x values contain reasonable resolution values.
  • Verify that downloads are at least starting but not finishing

    • Fetches are triggered here so you could add a log to make sure we're actually getting to the point where we trigger a download in XR.
    • There are stats on the current tiles state here that can be used to check the active queued downloads and queued geometry to parse. If fetches are never finishing while in XR the "parse" will never tick up while XR is presenting.

I'm wondering if the devices you're using throttle downloads or something while in XR?

Sounds great, thank you for directions. I'll log fetch routines during WebXR session + have a better look at download queue. The weird thing is that whatever is the routine failing on Chrome, works fine on Firefox Reality (Oculus Quest 2)

I'll test ASAP

phoenixbf commented 2 years ago

So far I can confirm that fetches are never completed. This can be easily replicated by logging queue stats: downloadQueue.items keeps growing and stacking items as we move around during an active WebXR session. XR Camera on the other hand seems fine (at least, already downloaded tiles switch correctly - also taking into account 6-DOF motions)

so it looks like the issue is specifically related to download/fetch rountines on downloadQueue.add() - and maybe promises not being correctly resolved? Is there a different implementation between Chrome and Firefox Reality? I'll try to dig a bit more by tracking those calls

PS: btw, outstanding work!

gkjohnson commented 2 years ago

So far I can confirm that fetches are never completed. This can be easily replicated by logging queue stats: downloadQueue.items keeps growing and stacking items as we move around during an active WebXR session.

Just for some context the way that the priority queue works is that multiple tasks (planned tile downloads) can be added to the queue and only a small set of them are run at a time (6 by default). Once one of those jobs completes another one is picked up from the queue and kicked off. So it does sound like tasks are being successfully added to the queue to download.

so it looks like the issue is specifically related to download/fetch rountines on downloadQueue.add() - and maybe promises not being correctly resolved? Is there a different implementation between Chrome and Firefox Reality?

There is no WebXR-specific logic in the core library. All WebXR handling is relegated to the example/vr.js example implementation so there's nothing deliberately being done differently between the browsers. This is sounding more like a browser-issue as you dig deeper. I think the last thing I would try is logging when the fetch is kicked off and immediately logging the resolved value of the fetch here. Something like this:

console.log( 'starting to fetch ' + uri );
return fetch( uri, Object.assign( { signal }, this.fetchOptions ) )
  .then( res => {

    console.log( 'fetch for uri ' + uri + ' finished' );
    return res;

  } )
  .catch( err => {

    console.error( 'fetch for uri ' + uri + ' failed' );
    console.error( err );
    throw err;

  } );

Returning the result in then and rethrowing the error in catch will let the rest of the system continue to work as intended but make it clear whether the fetches are being trigged and definitely never finishing. If the fetches are never finishing then that should mean it's a browser issue and maybe a smaller repro can be produced without the 3d tiles renderer code by triggering several fetches while in XR. From there we can ping one of the Oculus browser devs to see what their take is.

It seems odd that fetches of all things would be having problems in XR but perhaps there's more nuance to the options that are being passed or something that are causing problems that will have to be dug in to.

PS: btw, outstanding work!

Heh, glad to hear the rest of it is looking good 😁 You've hit quite the perplexing problem, though

phoenixbf commented 2 years ago

There is no WebXR-specific logic in the core library. All WebXR handling is relegated to the example/vr.js example implementation so there's nothing deliberately being done differently between the browsers. This is sounding more like a browser-issue as you dig deeper.

Yes, I meant something different at browser level (in this case Firefox Reality vs Chrome)

I was able to restrict a bit more the problem in TilesRendererBase.js inside requestTileContents(that is called fine for both standard and WebXR sessions)

Specifically, at line https://github.com/NASA-AMMOS/3DTilesRendererJS/blob/527bc68884306fa506c0d3bb2bc457bf16f2bfd4/src/base/TilesRendererBase.js#L506

the downloadTile callback is never fired in an active WebXR session, that would explain why items keep being added to DL-queue. I also had a look at PriorityQueue.add() (a few logs here and there) and the promise is correctly created, although the callback is never fired. I'm still trying to figure out why

Is there any routine I can call to clear/reset all the queues (pending items etc.)? I would try to clean things up when WebXR session is started, just to be sure it's not related to pending stuff.

Testing on usual dataset Dingo Gap

gkjohnson commented 2 years ago

Got it thanks -- I misunderstood.

The fact that the callbacks to trigger the fetch aren't firing is interesting. One thing to note is that like I mentioned above is that if 6 download jobs have already started then no more will begin until at least one of those 6 have completed. You can check how many jobs are currently running by checking the PriorityQueue.currJobs field to see if it's at capacity (the maxJobs count field).

If tiles are downloading before entering VR mode but not after I wonder if 6 jobs are already queued and once entering VR mode the fetches never complete? You can check to see if tilesRenderer.downloadQueue.currJobs is set to the same as maxJobs after entering VR to see if that's what's happening.

Is there any routine I can call to clear/reset all the queues (pending items etc.)? I would try to clean things up when WebXR session is started, just to be sure it's not related to pending stuff.

You should be able to call tilesRenderer.dispose() to dispose of all loaded tiles but I don't think everything is set up to afford cleanly calling "update" again after that, though. One of these options might work, though:

lojjic commented 2 years ago

It's likely this is due to the PriorityQueue scheduler using requestAnimationFrame. A WebXR session has its own requestAnimationFrame method separate from that of the window, and it's common for implementations to pause the window's version while in an immersive session. Probably just changing that to a setTimeout would be sufficient, unless this really needs to be synced to display frames for some reason?

(Firefox Reality doesn't have a true WebXR implementation so it doesn't have the distinct rAFs, that would explain why it "works" there.)

Edit: switching to setTimeout would also have the effect of allowing new jobs to be started while the browser tab is backgrounded. I'm not sure if that's desired behavior or not, tbh.

gkjohnson commented 2 years ago

It's likely this is due to the PriorityQueue scheduler using requestAnimationFrame. A WebXR session has its own requestAnimationFrame method separate from that of the window, and it's common for implementations to pause the window's version while in an immersive session.

Oh wow that's an obscure bug. Thanks for shedding some light on it! Sounds like the Oculus browser must have made some changes to how rAF functions since this didn't seem to be an issue in desktop Chrome as far as I remember.

switching to setTimeout would also have the effect of allowing new jobs to be started while the browser tab is backgrounded. I'm not sure if that's desired behavior or not, tbh.

I think this is fine -- it might actually be preferred so data downloads when the browser is hidden.

The biggest reason I used rAF over setTimeout was because it's the only guaranteed way as far as I know to make sure a task runs on the next frame rather than the current one which could cause more work and therefore stalls to occur than normal. And when using a setTimeout it's not guaranteed that 16ms is the duration of a single frame or that a frame hasn't gone long and overrun that 16ms causing it to be run on the current frame again causing a longer stall (I'm not sure if there's explicit browser behavior in a case like this).

Any thoughts you guys have are appreciated and I'm happy to take a PR to address it if we can come up with a good solution.

phoenixbf commented 2 years ago

That's great guys this makes sense and it explains why Firefox Reality (on Quest 2) was working fine - also that's why it was working on WebXR emulator.

I tried both these solutions in scheduleJobRun and they both works now. This is with timeout (100 msec here) suggested by @lojjic :

        if ( ! this.scheduled ) {
            setTimeout(() => {
                this.tryRunJobs();
                this.scheduled = false;
            }, 100);

            this.scheduled = true;
        }

or direct call to tryRunjobs() in scheduleJobRun (this.scheduled flag no more needed here):

this.tryRunJobs();

I tried both solutions on default Oculus browser (Quest 2) and also android WebXR on mobile Chrome: I confirm these both address the issue, with no perceivable difference in terms of tile scheduling jobs - altough I plan a more detailed assessment this week.

gkjohnson commented 2 years ago

@phoenixbf

direct call to tryRunjobs() in scheduleJobRun (this.scheduled flag no more needed here):

this.tryRunJobs();

The reason tryRunJobs is not immediately called is because it can result in more work that desired happening in a single frame. The "scheduled" flag ensures that even when multiple tiles renderers are sharing a single download / parse queue that we don't try to run jobs multiple times in a single frame. The delay ensures that we wait until all the tiles renders that will be updated per frame have finished updating before trying to run jobs.

Similarly we want to make sure that we don't try to run new jobs immediately after we've finished work because that can lead to a frame delay, as well. Calling "fetch" seems to be an expensive operation so if you start calling fetch immediately after a a download has finished (or starting a new parse immediately after a parse has finished) you'll be spending more time that is necessary on a single frame. Not to mention cases where parsing might be synchronous this would cause the queue to eat through all jobs in a single frame. I made these changes because I was seeing stalled frames.

I'm still feeling like requestAnimationFrame is the best callback for this kind of issue so it's a shame that it's not called during XR sessions. Unless there are other ideas why don't we change it to a 10ms timeout so it's resilient to XR sessions but will stilly likely run on the next frame. If this winds up being too slow or problematic maybe this is the kind of thing that's worth making configurable.

Thoughts?

phoenixbf commented 2 years ago

Yes, this makes sense @gkjohnson We could stick with first option (setTimeout) with 10ms.

Another option maybe would be having a xrSession variable set whenever an XR session is started: if valid the scheduleJobRun routine could switch to xrSession.requestAnimationFrame(...) otherwise the default requestAnimationFrame. A few examples also here: https://developer.mozilla.org/en-US/docs/Web/API/XRSession/requestAnimationFrame

Or more in general, a way to provide customized scheduleJobRun()

lojjic commented 2 years ago

I'm still feeling like requestAnimationFrame is the best callback for this kind of issue

You're probably right about that. 🤔

Three.js does have WebGLRenderer.setAnimationLoop(callback) for this purpose; since it has knowledge internally of the XR session switch it's able to hop between the rAFs and ensure the frame callback is consistently invoked.

Of course PriorityQueue itself can't assume it's in a Three.js environment, but maybe there could be a way to optionally hook into that if available?

gkjohnson commented 2 years ago

Yes I think a configurable callback would be best. Here are two things that come to mind -- let me know what you guys think:

};

tiles.downloadQueue.schedulingCallback = schedulingCallback; tiles.parseQueue.schedulingCallback = schedulingCallback;


- Another default alternative could be one that schedules both a rAF _and_ a setTimeout so even if a user doesn't override the callback appropriately for XR it can still function (though maybe not optimally). It's a bit more complicated, though, and I'm not sure if I love it. Perhaps we wait to see if the above solution isn't good enough for some reason:

```js
function defaultSchedulingCallback( func ) {

  const cb = () => {

    if ( this._rafHandle ) cancelAnimationFrame( this._rafHandle );
    if ( this._toHandle ) clearTimeout( this._toHandle );

    func();

  };

  this._rafHandle = requestAnimationFrame( callback );
  this._toHandle = setTimeout( callback );

}

Happy to take a PR for the first point above if you guys are in agreement.

phoenixbf commented 2 years ago

Thanks @gkjohnson I made a PR with custom schedulingCallback. It defaults to requestAnimationFrame() as you suggested With this addition, so far the best solution on my app for instance (I have a custom setup not allowing me to use rAF on active XR session) to handle both standard and WebXR sessions is simply:

const tsSchedCB = func => {
        setTimeout( func, 50);
};

ts.downloadQueue.schedulingCallback = tsSchedCB;
ts.parseQueue.schedulingCallback = tsSchedCB;

Tested on mobile WebXR (android Chrome), official Oculus browser (Quest v1 and v2) and Firefox Reality.

If this works for you guys, we could also update the VR example in the future

ps: we may also consider a default schedulingCallback in PriorityQueue that already addresses WebXR - although if we intend to use rAFs path, at the moment it would require additional methods IMHO (e.g. a setXRSession or setRenderer) to handle tiles downloading/parsing during active WebXR sessions

gkjohnson commented 2 years ago

ps: we may also consider a default schedulingCallback in PriorityQueue that already addresses WebXR - although if we intend to use rAFs path, at the moment it would require additional methods IMHO (e.g. a setXRSession or setRenderer) to handle tiles downloading/parsing during active WebXR sessions

If one can be suggested I'm open to it but I'm not sure I like setting an XRSession or renderer, though. It's possible for there to be multiple renderers on a page and technically any of those could start an XR session. It's definitely a corner case but I think setting the scheduling callback explicitly is more clear and flexible. If there's another general solution out there that would be suitable that doesn't rely on rAF and isn't prone to the issues listed above that would be great.

gkjohnson commented 2 years ago

Hey @phoenixbf, @lojjic sorry for the delay I've just published v0.3.4 which includes the "schedulingCallback" function for PriorityQueue from #216:

https://github.com/NASA-AMMOS/3DTilesRendererJS/releases/tag/v0.3.4

Thanks again for your guys' help in figuring out the problem and getting it addressed!