Background

Textures in the GPU sense are essentially linear arrays of (generally but not always) pixels in memory that is accessible to the GPU. These can be various sizes, 1D or 2D or 3D, LDR (Low Dynamic Range, with values between 0 and 1) or HDR (High Dynamic Range, with values in a greater range (sometimes negative!)). Blah blah blah.

One important element is that textures generally contain mipmaps. These are sets of smaller versions of the image where each mip is half the size of the mip before.

_https://commons.wikimedia.org/wiki/File:MipMap_Example_STS101.jpg_

The reason for mipmaps is that if you always read from textures at the highest resolution, you're going to get aliasing effects when the texture is far away and you start skipping over pixels. If you read a mip level for the appropriate size the texture appears on the screen then can avoid this:

_https://commons.wikimedia.org/wiki/File:Mipmap_Aliasing_Comparison.png_

Regular PNG and JPEG images

When loading PNG and JPEG images, we need to decompress them to get out the raw bytes of pixels to upload to the GPU. Because PNG and JPEG images have no support for mipmaps, we need to generate these on the GPU after uploading. This takes a small amount of time.

Additionally, if an image we attempt to load exceeds the maximum allowed texture size of the device (e.g. 2048x2048) then we're in trouble as loading it (even to just generate mipmaps) would crash the device. We could either (s l o w l y) downscale the image on the CPU to the max allowed size or just not load the image at all. Both of these options are bad.

The KTX2 format

The KTX2 file format is a container format for images intended to be loaded as textures onto a GPU. It is not, itself, an image codec in the same was as PNG or JPEG is.

KTX2s contain some really nice properties:

A super simple, easy to read spec
The ability to contain mipmaps (meaning that we can just load these instead of generating them)
A generic key-value section
Support for this thing it calls 'supercompression' which is additional lossless compression applied to each mip. There are multiple supercompression schemes but zstd is the only one worth caring about.

An additional consequence of KTX2s containing mipmaps is that if the largest size of the texture is exceeds the maximum allowed texture size of the device, we can just use a texture down in the mip chain as the base texture. This ensures that pretty much any valid KTX2 should be loadable.

Progressive loading

Because the header of the KTX2 file specifies exactly where each mip level is in the file, we can use HTTP range requests to fetch each mip level individually, starting with the smallest first. This means that textures can be viewed even when the whole thing is not fully loaded.

While you could create KTX2 with the raw decompressed bytes of PNG/JPEG images, it's generally a bad idea because the file sizes will be very large. With 1 byte per channel (red/green/blue/alpha), you're looking at 4 bytes per pixel and 1024 1024 4 = 4MB for a 1024^2 image.

Therefore people generally use them with:

Block-compressed texture formats

GPUs have their own type of compressed texture format, but these are a bit different from the run-length-encoding compression that JPG/PNG etc. have. The basic idea is that the textures are divided up into 4x4 blocks which are then compressed and decompressed individually.

I'm just going to quote https://themaister.net/blog/2020/08/12/compressed-gpu-texture-formats-a-review-and-compute-shader-decoders-part-1/ (all 3 parts are worth reading!) here:

Very similarly, texture compression achieves its compression through interpolation between two color values. Somehow, the formats all let us specify two endpoints which are constant over the 4×4 block and interpolation weights, which vary per pixel. Most of the innovation in the formats all comes down to how complicated and esoteric we can make the process of generating the endpoints and weights.

The main idea behind these block-compressed formats is that as they're 4x smaller than the equivalent bytes without compression, they take up much less memory bandwidth when rendering, which increases performance.

Because of patents (now expired?) and other stuff, the compressed texture ecosystem is split in this wonky way. There are essentially 2 families of compressed textures worth caring about:

The BCn family, of which the only 2 formats worth caring about are BC6H (supports HDR data) and BC7 (supports high quality LDR data). These are available on most desktop GPUs but not mobile GPUs
The ASTC family, where there's different modes for HDR and LDR data (most compressors only support the LDR mode!). These are available on most mobile GPUs but not desktop GPUs.

To re-post the platform support table from https://github.com/KhronosGroup/3D-Formats-Guidelines/blob/main/KTXDeveloperGuideWebGL.md#platform-support-table:

platform support

Because the ecosystem is split this way we get to:

The Basis Universal GPU Texture Codec

https://github.com/BinomialLLC/basis_universal

So Basis Universal is another compressed texture codec, but it's not actually one that any GPU natively supports. Instead it's more of an interchange format that can be quickly transcoded to either ASTC or BC7. There are actually 2 formats: ETC1S and UASTC. As far as I'm aware, ETC1S is lower quality and older so we'll ignore that. The basisu binary takes a JPG/PNG image, generates mipmaps for it, compresses it and writes it to a KTX2 file.

After both block compression and KTX2 supercompression has been applied, the resulting KTX2 files are at max 3x bigger than the source JPG/PNG images, while taking up much less memory and bandwidth on the GPU and being able to be progressively loaded.

How we use textures in superconductor

Cubemaps

Cubemaps for Image-Based Lighting (IBL) require HDR colour data (you could try and use LDR data with them but I doubt it'd turn out well ^_^). Currently we use BC6H textures for these, in a KTX2 container that also specifies the sphere harmonics in the key-value section.

As BC6H is not a format supported on mobile, I use a shader on mobile to decompress from BC6H to another format (currently Rg11b10Float): https://github.com/MeetKai/superconductor/tree/main/granite-shaders.

This is a bad solution. Decompressing like this introduces some loss in quality (as Rg11b10Float supports a reduced float range), takes some time and Rg11b10Float is still 4x bigger than BC6H.

Ideally I'd have 2 seperate cubemap files, one that's BC6H for Desktop and one that's ASTC for mobile. Unfortunately the only ASTC encoder that supports HDR data is astcenc. I need to update the rust bindings to that so that I can add it to my cubemap compression tool.

UASTC

For transcoding UASTC files in a native binary, the https://github.com/aclysma/basis-universal-rs bindings work great. Wasm is a bit more painful though. Basis universal is a C++ library that can't be bound to with wasm-bindgen, so the best (but still bad) solution is to bind against the pre-made emscripten wasm binary. I've got some bindings for this here: https://github.com/expenses/basis-universal-rs-wasm and a PR open here: https://github.com/aclysma/basis-universal-rs/pull/11.

ZSTD supercompression

The zstd crate luckily works in wasm just fine.

How we can load textures faster

It's possible that UASTC transcoding is not fast enough on mobile for our needs. In that case, I think we want to have a seperate set of texture files for mobile that are purely ASTC and load them without transcoding. Additionally, if ZSTD supercompression becomes a bottleneck (it shouldn't be) then we can turn that off for those textures as well.

At present JPEG/PNGs don't load especially fast in the browser. This is because we're decompressing them in wasm, instead of using the built-in browser's ability to do this.

MeetKai / superconductor

Textures discussion #11