KhronosGroup / glTF

glTF – Runtime 3D Asset Delivery
Other
7.2k stars 1.14k forks source link

Gaussian Splatting in glTF #2454

Open weegeekps opened 2 weeks ago

weegeekps commented 2 weeks ago

Co-author: @keyboardspecialist

Gaussian Splatting has been a hot topic within the 3D graphics industry for some time now, and many organizations have been exploring it. Here at Cesium, we've been playing with adding support for Gaussian Splatting to glTF and we wanted to share this with the community. We know others in the industry are also exploring this and our goal here is one of collaboration, not competition. As mentioned in the disclaimer below at the start of the summary of our apporach, we are not beholden to our current approach and are looking to work with the industry to find the right solution for the ecosystem, whatever that may be.

Gaussian Splatting in the Market

Gaussian Splatting is a natural fit for photogrammetry. It provides a relatively fast workflow from raw images to 3D representation which is beneficial for everyone from a hobbyist with a drone or a phone to enterprise organizations. Services like Niantic’s Scaniverse and Polycam allow any user with a phone to scan objects anywhere in the world, geolocate them and have a digital twin stored on a server that can be shared with other users. Where meshes were once generated, splatting can be used instead. Splatting has gained strong interest from the AEC sector for its higher fidelity and ability to maintain finer details on structures such as radio towers or power substations. Gaussian Splatting has been embraced by organizations for digital heritage preservation. This also has educational implications with bringing real world locations to life in new ways. In addition, splatting is continuing to grow in the AR/VR space.

Despite their integration with enterprise level software, organizations using Gaussian Splatting are largely still dealing with PLY files. PlayCanvas is one of the exceptions with their gsplat format and compressed PLY formats in their SuperSplat editor.

Reference Links

https://digitalheritagelab.com/index.php/2024/03/10/gaussian-splatting/ https://scaniverse.com/ https://aecmag.com/visualisation/v-ray-7-to-get-support-for-gaussian-splats/ https://radiancefields.com/gaussian-splatting-brings-art-exhibitions-online-with-yulei https://radiancefields.com/storysplat-bringing-3dgs-into-educational-experiences https://playcanvas.com/supersplat/editor

Why use glTF?

glTF is an open standard with wide interoperability. It’s an efficient format and provides a base structure from which we can build a Gaussian Splatting standard for today and in the future. PLY is the current ubiquitous format for splatting which is over 30 years old and was designed as a simple and general format. This is great for ease of implementation, such as in a research paper, but for actual market usage and interoperability it falls short. It’s a very loose and unstandardized format that does not provide a solid path forward for how splats should be represented and transmitted. Utilizing glTF also means that we can very easily begin integrating splats with other assets seamlessly for transmission and rendering. Its openness as a standard allows us to engage with the community to garner feedback on which direction we should take.

A problem with splats currently is how to render them efficiently whether it’s lack of LoD or the amount of data on screen to be rendered. 3D Tiles being built on top of glTF is the perfect opportunity to solve part of this problem. Full LoD remains an issue to solve, but tiling the data allows for efficient transmission and rendering.

Cesium's current approach

Disclaimer

We are not beholden to our current approach, but are including it here in part to help facilitate discussion. Ultimately, we want to do the right thing for the community, and are open to any and all approaches at this time. There may be other approaches, such as mirroring the .SPZ or .gsplat formats within glTF, that may be better. We’d like to invite authors of other approaches to also include their approaches in this issue.

Summary

We currently have a draft glTF extension to support Gaussian Splatting. Our approach is straightforward and emphasizes using the facilities provided by glTF as much as possible. We extend a mesh point primitive with new attributes with the requisite data to render splats. Gaussian Splats are defined by position, rotation, scale, opacity and spherical harmonics. Position directly maps to the glTF POSITION attribute. We map the zeroth-order spherical harmonic for diffuse color and opacity to COLOR_0. Two new additional attributes are added: _ROTATION and _SCALE for their respective splat attributes. Our approach is mostly a restructuring of source PLY data into glTF which gives flexibility to how it’s processed and rendered at runtime.

This is not a required extension which means we can seamlessly fall back to point cloud rendering if no support is found. Another benefit is that we support meshopt compression out of the box like any other mesh in glTF. Our current limitation is that we only support the zeroth-order harmonic for diffuse color. The higher order specular harmonics are size prohibitive and we have questions on what the best long-term strategy would be for storing and processing them.

https://github.com/CesiumGS/glTF/tree/gaussian-splatting-ext/extensions/2.0/Khronos/KHR_gaussian_splatting

Basic Example

The extension is added directly to the primitive itself. If it is quantized, and depending how, you may define a quantizedPositionScale value.

{
   "accessors": [
       {
           "type": "VEC3",
           "componentType": 5126
       },
       {
           "type": "VEC4",
           "componentType": 5121,
           "normalized": true
       },
       {
           "type": "VEC4",
           "componentType": 5126
       },
       {
           "type": "VEC3",
           "componentType": 5126
       }
   ],
   "meshes": [
       {
           "primitives": [
               {
                   "mode": 0,
                   "attributes": {
                       "POSITION": 0,
                       "COLOR_0": 1,
                       "_ROTATION": 2,
                       "_SCALE": 3
                   },
                   "extensions": {
                       "KHR_gaussian_splatting": {
                           "quantizedPositionScale": 1.0
                       }
                   }
               }
           ]
       }
   ]
}

We are working on hosting a public example using CesiumJS Sandcastle that we plan to share later this week.

Implementation

Loading

Loading glTF files containing Gaussian Splats is the same as any other. Support for the KHR_gaussian_splatting extension may need to be added first. Given that it is point primitive, if no support is found it will fall back to just rendering as any other point cloud. As part of this, meshopt decoding should happen automatically. If the data is quantized, it will need to be further processed at runtime.

Sorting

Because splats are not uniform and complex patterns and shapes are built through the layering of many Gaussians, they must be sorted by distance from the current camera position. In the suboptimal case, this means resubmitting vertex data to the gpu every sort. In the worst case, this would be every frame. However, if the scene does not change, no sorting needs to occur.

Further optimization can be had by generating textures from the splat data and submitting those to the GPU once. Sorting then just becomes an update to the indexes into those textures. Part of this optimization can be precomputing the 3D covariance from the scale and rotation.

Radix sorts generally provide good performance whether GPU accelerated or not.

Rasterization

Our current implementation is in CesiumJS with WebGL, so the process below reflects that. We don’t currently have access to WebGPU compute shaders which would offer more opportunities to optimize parts of the rendering process. Part of these limitations is we have to render splats as quads. Finding an approach that does not need any extra vertices generated could be beneficial to performance. Native implementations generally use CUDA to accelerate both sorting and rasterizing through GPU Radix sorts, tiling for higher parallelism, etc.

Vertex Shader

Most of the work is done here:

  1. If it hasn’t been precomputed you must compute the 3D covariance.
  2. 3D covariance is then projected into 2D space for the 2D covariance.
  3. 2D covariance is decomposed into eigenvectors
  4. Using those we calculate final vertex and and clip space position

Fragment Shader

The fragment shader is very straightforward. We use the squared distance from the center of the splat to calculate an exponential decay combined with the opacity to generate the final splat raster. The final output is the diffuse RGB color premultiplied with the calculated decay alpha value.

Spherical Harmonics and Specular

We consciously decided to not support higher order spherical harmonics dealing with specular highlights for a few reasons:

  1. Size - They constitute the majority of the splat size. Minus zeroth-order diffuse, 45 32-bit floats - 180 bytes per splat.
    • We don’t want to impose lossy compression on our users by default
      1. Question of benefit versus complexity.
        • Specular does add dynamism to the scene but is the cost of storage and computation worth it in all cases?
      2. Leave room for a future method of storing or calculating them
      3. Computation complexity increases quadratically with each degree. Calculating this per splat per frame is prohibitively expensive on platforms without highly parallelizable compute. Is first degree specular enough for most cases?
        • Degree 0 - 3 muls
        • Degree 1 - 18 ops: 9 muls and 9 adds
        • Degree 2 - 40 ops: 25 muls and 15 adds
        • Degree 3 - 80 ops: 50 muls and 30 adds

We believe it makes more sense to propose this as one or more separate extensions which can deal with this complexity on their own. These extensions should answer questions such as:

3a1b2c3 commented 2 weeks ago

https://github.com/adobe/USD-Fileformat-plugins fyi

SharkWipf commented 1 week ago

One concern raised in the MrNerf Discord server, is that standardizing on a format this early in the game might either stifle innovation, or result in an explosion of competing standards, especially in a field as actively researched as this.
New methods that may need to store additional information would be forced to forego early standardizations like this.
On top of that, most whitepaper implementations likely simply won't care to support anything beyond what they need.

While the creation of standards is inevitable, whether they will be able to be successful at this point remains to be seen.

After some further discussion, we instead worked out a super early(!) draft for a high-level container format.
To be clear: This would have no direct implications on your work as it operates at a different level entirely, but it felt worth mentioning it here since it's closely related and adds a different perspective, and now is technically ongoing work in parallel.
An initial early version of this draft is posted here, for reference: https://gist.github.com/SharkWipf/a02a2616424d0a2ab69af2d3ad8c1829

keyboardspecialist commented 1 week ago

Concerns on standardizing too early are certainly valid. I want to emphasize that our goal is to start discussions and not to just push for ratification of some standard as quickly as possible. We don't expect glTF to be the only format within the splatting ecosystem. Rather this is an attempt to get them into glTF for the "last mile" where glTF thrives. While we’d love to see white papers use glTF for splats, it isn't really the goal here. Bleeding edge research is the perfect place for highly flexible and bespoke formats. That said, I don't think it's reasonable to forgo any discussions in standardization over what might or might not appear in the future. When we look at the market today it is largely (all?) using the original reference implementation. So while flexibility is important and something we hope to incorporate here, we also want to be pragmatic as to what is essential to support. A solution to reduce fragmentation for the early “last mile” production use cases of Gaussian Splats is our ultimate goal.

There is ongoing discussion in the Metaverse Standards Forum, and we’d like to extend the invitation to you and others to contribute. Niantic will be presenting their work on the SPZ format in the near future.

A high level container format is interesting. After reading the spec overview, I see both of these coexisting rather than competing. Seems like it would be perfectly reasonable that it could contain a glb.

Finally, we're interested in joining the discussion on the MrNerf discord server. How can we join the server?

SharkWipf commented 1 week ago

Pretty much agreed on all points. To name one specific example though, of something that I believe isn't currently accounted for in any of the proposed formats, ScaffoldGS was, last I checked, SOTA or close to SOTA, in quality. But in order to accurately represent them in viewers, I believe (I may be wrong here though) you need to store both the MLPs and the anchors. I don't think any of the proposed formats account for this kind of scenario.

A high level container format is interesting. After reading the spec overview, I see both of these coexisting rather than competing. Seems like it would be perfectly reasonable that it could contain a glb.

That was the intention, yeah. To be able to handle the different formats that are inevitably going to be popping up in a somewhat organized manner.

Finally, we're interested in joining the discussion on the MrNerf discord server. How can we join the server?

The invite link: https://discord.com/invite/NqwTqVYVmj As taken from MrNerf's twitter profile.

simonbethke commented 1 week ago

I really like this topic and I will be very happy to try my best submitting PRs for loading an early version of GS in glTF to SuperSplat were I am a little active. Regarding the standardization I still think that Gaussian Splats are not yet the final ideal way of shaping the data. I cannot stop emphasizing, that I have the impression that Spherical Harmonics do a bad job while eating up all the memory :)

If I were able to either having the time to learn that or having the knowledge to just do it, I would try to experiment with splats that have a viewing direction and disappear off-axis.