diarmidmackenzie / moonrider

🌕🏄🏿 Surf the musical road among the stars. Side project built by two people in a few months to demonstrate WebXR.
https://moonrider.xyz
MIT License
1 stars 0 forks source link

Moonrider performance investigation #3

Open diarmidmackenzie opened 1 year ago

diarmidmackenzie commented 1 year ago

I've been looking at some generic Three.js / A-Frame performance improvements related to updateMatrixWorld()

See: https://github.com/mrdoob/three.js/pull/25142 for background.

@dmarcos suggested I look at the impact these changes would have on Moonrider, as the most popular WebXR experience. I've raised this issue to document that investigation.

Initial Results

I did some quick profiling of Moonrider on Windows/Chrome (easier environment than a VR headset), and found it was spending > 70% of CPU inside updateMatrixWorld() suggesting it would be a great candidate for optimization.

image

A quick console check revealed 2902 objects in the scene graph:

count = 0; document.querySelector('a-scene').object3D.traverse((o) => count++ ); count
2902

Moonrider runs against A-Frame 1.1.0, so I back applied the changes in 25142 to THREE.js r123.1 and used this to build a modified version of A-Frame 1.1.0.

Three.js: https://github.com/diarmidmackenzie/three.js/tree/super-r123-1-performance A-Frame: https://github.com/diarmidmackenzie/aframe/tree/1.1.0-performance

I then modified Moonrider to run using this verson of Three.js.

It seems to be running fine with this updated version. However, the performance gains are not as remarkable as hoped.

Nevertheless, the reduction from 70% to 55% was less than see with other apps. I'll explain why that is...

Why only small gains?

Of the 2902 objects in the scene graph, a console query shows that we are skipping matrix calculations for just 446 of them (about 15%).

objs = []; document.querySelector('a-scene').object3D.traverse((o) => {if (o.privateMatrixData !== o.matrixData) { objs.push(o) }}); objs.length
446

There's a few reasons for this:

However there does seem to be some substantial scope to optimize performance further...

How to optimize further

As things stand, we have a lot of objects in the scene graph with distinct matrix data. Since we can't ignore non-identity matrices without compromising the experience, to optimize further, we'd need to significantly reduce the number of objects in the scene graph.

It looks as though most of the objects in the scene graph are inactive pool objects. My understanding is these pools exist to minimize performance issues when spawning new objects - instead of creating new objects from scratch, they get switched in from a pool.

These pools of objects are the reason why there are 2902 objects in the scene graph, even when there isn't that much going on on screen. They are invisible, so they don't have any impact on rendering performance, but they are still a part of the scene graph, which means they each have to be processed every frame in updateMatrixWorld() even though they aren't visible.

It could be tempting to suggest that updateMatrixWorld() should skip over non-visible objects, but there are good reasons to ensure that non-visible objects are correctly positioned, as non-visible objects can be used as colliders, raycasted against etc.

However these objects, when resting in pools don't need to have their matrices updated every frame. The simplest way to avoid these updates would be to simply remove these objects from the scene graph when they are sitting unused in a pool.

I believe this would be a simple change, that would substantially improve the performance of Moonrider. The performance overhead of adding an object3D back into the scene graph when activating it from a pool should be minimal, just a couple of matrix multiplications.

However I have held back from making any changes to moonrider code itself, due to some GitHub issues.

Issues with moonrider repo

I'm hitting an error when cloning the moonrider repo:

Downloading assets/img/envmap.psd (604 KB)
Error downloading object: assets/img/envmap.psd (0837264): Smudge error: Error downloading assets/img/envmap.psd (0837264a7ea743565218b35ee8cd4f5b2244810963a2de0a3c2d75a84dc35553): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

This error means I've not been able to set up any version control locally for the moonrider code. I've tried making a couple of changes along the lines above, but working without version control in an unfamiliar codebase is not an ideal situation...

I can come back and look at application-level optimizations (where I think there's lots of potential), when I have a working version control setup...

For the same reason, I haven't published a live version that uses the latest A-Frame yet.

Next steps & resources

To use the version of A-Frame with performance improvements, update the A-Frame version loaded in index.html to this:

<script src="https://cdn.jsdelivr.net/gh/diarmidmackenzie/aframe@1.1.0-performance/dist/aframe-master.min.js"></script>

Next steps:

diarmidmackenzie commented 1 year ago

On further review, it turned out the best place to make the further changes was in the A-Frame pool component, so no application changes needed (apart from pointing to the newer A-Frame).

This fix (much simpler than all the other stuff I have been working on!)

We've now got total CPU for updateMatrixWorld() down to < 30%, less than half of what it was when we started.

image

FPS on my Windows PC is definitely up from 55, now very close to 60. It's capped at 60 on PC, I believe, and still misses sometimes, so everything's not perfectly smooth still, but the overall CPU load must be substantially reduced.

diarmidmackenzie commented 1 year ago

Managed to do some testing on Quest 2, using the OVF Metrics Tools described here: https://developer.oculus.com/documentation/unity/ts-ovrmetricstool/#collect-performance-data-with-ovr-metrics-tool

I played Ed Sheeran, Shape of You, hitting more-or-less all the blocks (I believe block fragments are more intesive to render than blocks).

Here's a FPS chart from the original (https://moonrider.xyz)

image

In this version frame rate was ~flat at 44-45 FPS

And here from a fixed version (https://diarmidmackenzie.github.io/moonrider)

image

In this version Frame rate did sometimes dip as low as 45FPS, but was much ore variable, and sometimes a lot higher.

So on average, the frame rate was considerably better, but still highly variable.

I tried to also record data to file, which would allow for a more detailed analysis, but haven't yet figured how to access the data (supposedly it's written to csv files in: /OVRMonitorMetricsService/CapturedMetrics/ but I can't see any such files on my headset).

diarmidmackenzie commented 1 year ago

I didn't notice any functional deficiencies in the updated version - seemed to be running absolutely fine.

diarmidmackenzie commented 1 year ago

CPU usage data...

old.. image

new... image

So we're seeing the expected gains in updateMatrixWorld().

Big difference between running on VR headset & on my PC is that on PC the punchable blocks don't get rendered, so I wasn't seeing the cost of that in my earlier analysis. Looks like the cost of rendering those blocks is now the dominant factor in overall performance terms.

I suspect savings can be made there, but probably needs to be done at the application level, e.g. using instancing for the blocks?

devmaxxing commented 1 year ago

Came across discussion about this in the WebXR discord. Not sure if it's applicable here, but I thought I'd mention that one thing I did for my rhythm game (https://github.com/CadenzaVR/cadenza) is to use InstancedMesh for each unique note type. So I just have one mesh/object for each note type instead of one for each note.

diarmidmackenzie commented 1 year ago

Thanks, yes, using instancedMesh should give some decent performance gains.

Using aframe-instanced-mesh should allow this to be done without too much reworking of the existing code.