GPU Metadata Property Table Packing for 3D Tiles Next

ptrgags commented 3 years ago

One of the upcoming parts of our 3D Tiles Next effort is to pack metadata (specifically, feature tables) for use on the GPU. This will be necessary for both custom shaders (see https://github.com/CesiumGS/cesium/issues/9518) and GPU feature styling.

Packing Overview

The goal for this subsystem is to take the metadata from the CPU, pack it into GPU memory (textures, attributes and uniforms), and then unpack it in the shader.

Only properties used in the shader code will be uploaded to the GPU. @lilleyse’s model-loading branch will have a way to determine this.

Once uploaded and no longer needed on the CPU, try to free the CPU resources. We should include some options for controlling this.

We also want to make any texture management general-purpose, as the refactored Model.js will use other types of textures (feature textures, feature ID textures).

Datatype Compatibility

Not every data type is GPU-compatible. For example, STRING and variable length ARRAY are not easily representable on the GPU. Also, 64-bit types are not directly representable, but a fallback would be to convert them to 32-bit types.

Furthermore, WebGL 1 only supports 8-bit integer or 32-bit float (with OES_texture_float) textures. For larger integer types, multiple image channels or multiple pixels will have to be used.

Supported Types

BOOLEAN (as UINT8 0 or 1), UINT8, INT8 as single-channel textures (LUMINANCE, ALPHA)
FLOAT32 (when OES_texture_float is available)
ARRAY of INT8 or UINT8 with 1-4 components (LUMINANCE, LUMINANCE_ALPHA, RGB or RGBA depending on size)

Supported with Fallbacks

UINT16, INT16 as 2-channel textures (LUMINANCE_ALPHA)
UINT32, INT32 as 4-channel textures (RGBA)
UINT64, INT64 converted (lossy) to FLOAT32 and stored as a float texture. This is trading precision for better runtime performance. If this happens, a warning will be printed to the console like PointCloud.js does.
FLOAT64 converted (lossy) to FLOAT32 and stored as a float texture
FLOAT32 as a 4-channel texture (RGBA) when OES_texture_float is not available (see https://github.com/CesiumGS/cesium/blob/master/Source/Scene/createElevationBandMaterial.js#L483-L501)
ARRAY of INT8

Not supported

STRING
Variable-size ARRAY
Fixed-size ARRAY with length greater than 4

Other Notes:

Should we allow fixed-size arrays with 16 or 32-bit components? E.g. could we have a 16-bit vec3? Or would this add too much complexity?
For vertex attributes, this list will look different. There are no “channels” per-se, but attributes support vector and matrix types.

Encoding Considerations

There are some special cases where values need additional encoding:

Normalized properties are stored as integers but unpacked on the GPU as floats

Choosing a GPU layout

The main unknown right now is how to choose an optimal GPU layout. The calling code will provide a list of properties and information about what GPU resources are available. The layout algorithm needs to take this information and determine what textures/vertex attributes/uniforms to use to store the metadata.

One possibility is to divide the properties into three categories:

Tileset/group/tile metadata is coarse, so these will be stored in uniforms
Feature metadata that is defined at every vertex (i.e. constant: 0, divisor: 1) are good candidates for storing in attributes
Any other feature metadata will be stored in textures
Properties with a defaultValue can be inlined into the shader code to avoid using GPU resources.

However, determining the exact layout is more involved. Here are some complicating factors:

There are only a limited number of textures.(e.g. 16-32 on our laptops)
Similarly vertex attributes are limited (e.g. 16 on my laptop)
Textures are ideally 1D, but to fit more properties, more rows can be added.
Alternatively, multiple properties can be stored as different channels of the same texture. However, this is less flexible than using multiple rows.
There may be more feature IDs than the maximum texture resolution. Thus, values need to be wrapped onto multiple rows of a texture.
A property may be encoded in multiple channels of a texture to fit larger data types as explained in the sections above.
There may not be enough free textures/attributes so we might want to switch between the two to fit all the metadata in

Inputs:

Property information:
- Note: calling code will only include properties that are used in the shader
- How many feature IDs total (featureTable.count)
- Property data types
- Granularity of metadata (tileset/group/tile/feature/vertex)
Hardware limitations:
- Number of textures/attributes/uniforms available
- How many textures are needed for feature textures/feature ID textures/etc.
- Texture size limits
- What texture data types are supported (floating point textures? Luminance-alpha? etc.)

Output:

A “layout”: for each property, a description of how it should be stored.
For textures:
- Texture index
- Texture dimensions
- How many rows per property (as textures have limited width)
- Texture data type (integer, floating point, etc)
- Texture format (RGBA8, luminance alpha, etc.)
- Which component(s) are used
- Encoding information (float32 packed as RGB8, quantized, etc)
For attributes:
- Attribute name
- Data type
- (if numeric values are packed as vectors) which component
For uniforms
- Uniform name
- Data type
- (if packed as vectors) which component

This layout can be used by the caller to set the Property struct in the shader, as well as determine where/how to upload data to the GPU.

Stretch Goal: Filtering

One detail that would be nice to have is to allow a method to let the user filter properties. This has a number of benefits:

Filtering out unused properties means less data to upload to the GPU
This can also be used for reducing network requests (see https://github.com/CesiumGS/cesium/issues/9553 )
Could allow the user to hint if they want to use a property on the GPU or on the CPU or both. This would be useful because then resources can be
Filters could be passed into the tileset and used everywhere

Potential downsides:

Need to be careful not to create a large API surface area In some cases, filters are only partially effective. E.g. if you already downloaded a large binary buffer, to release one bufferView, you need to move all the other bufferViews

To Do:

[ ] Design and implement an algorithm for choosing a GPU memory layout given a list of properties, hardware limits, and current texture/attribute/uniform usage.
[ ] Account for encoding rules: fallbacks for hardware that doesn’t support larger int/float types, normalized properties, etc. This might be a pre-processing step before running the layout algorithm.
[ ] Implement CPU-side functions for packing the metadata into textures/attributes/uniforms according to the layout.
[ ] Implement GPU-side czm_ builtin functions (or snippets appended to a shader) for unpacking metadata in GLSL
[ ] See what other details are needed to integrate this into the Model.js refactor
[ ] Stretch Goal: Filtering of properties to give the user more control and to avoid using unneeded resources (See https://github.com/CesiumGS/cesium/issues/9553)

ptrgags commented 3 years ago

Yesterday I discussed some details about textures with @lilleyse, here are some notes from that:

We want to make a simplifying assumption that we will only pack metadata values in ways that take a single texture read. This reduces complexity considerably
Since CesiumJS uses WebGL 1, signed integers must be stored as UINT8 and explicitly unpacked to a signed value on the GPU
In general, the GPU should use the closest type available in the shader, regardless of how the data is stored. For example, a BOOLEAN will be stored in the texture as a UINT8, but interpreted as a bool in the shader, not a uint.
When storing multi-byte values in a texture (e.g. storing a UINT32 as 4 channels of a texture), we're picking the convention of storing the bytes in little-endian order (i.e. low byte is red, high byte is alpha)
To maximize how many properties we can fit in a texture, we'll allow packing multiple properties in different channels of the same texture (e.g. store 2 different UINT16 properties in the RG and BA channels of a texel). However, there will be a flag to control this, as in some cases this is undesirable (e.g. interpolating values)
When there are a lot of features (more than the maximum texture resolution), features will have to wrap onto multiple rows. To avoid wasting memory, this can easily be rebalanced:

rowsPerFeature = ceil(featureCount / maximumTextureWidth)
actualTextureWidth = ceil(featureCount / rowsPerFeature)

properties can come from any number of feature tables.
My initial thought is to use the same number of rows per property to make the texture access a simple formula. However, this may be tricky with feature tables of different sizes. Also @lilleyse suggested that we could store everything in one big row that wraps. The access pattern may be a little more involved though. I'll think about both approaches and try to determine which one is best for our purposes.

ptrgags commented 3 years ago

Proposed Metadata Packing Algorithm

At a high-level, the algorithm will have the following phases:

Partition properties into the different types of WebGL concepts (textures/attributes/uniforms/constants)
Determine if data types are representable on the GPU. If not, throw an error
Determine the data type that will be used on the GPU and a list of steps needed to convert/pack the data for the GPU
Group properties into textures/attributes by size. This can be packed tightly (storing multiple properties in a single texel/vec4 for better memory usage) or packed loosely (separate texels/attributes for easier interpolation). A flag should control this.
Compute the exact layout including any byte/texel offsets as needed.
"vacuum pack" the textures/attributes, e.g. choose smaller texture dimensions to avoid wasting memory, don't use unused texture channels, etc.

Partitioning Properties

For this first iteration, let's keep this simple using the rules I mentioned in the description. To recap:

defaultValue properties are constants. They will be inlined in the shader code.
Tileset/group/tile properties are constant over every vertex, but vary from content to content, so store them in uniforms
For per-vertex properties (constant: 0, divisor 1), use attributes
Any other per-feature properties will use textures

In theory, we might want to fallback between textures <-> attributes, but I'll hold off on this for this first iteration.

Type Representability

This step is very simple, it simply rejects the following types as "not representable" - any other issues like lack of floating point texture support will be caught in the next step.

STRING/ARRAY[STRING] - not allowed because it is variable length at each vertex
ARRAY[*, 5+] (any array of 5 or more components) - textures and attributes are limited to 4 channels/components, so we are not supporting any more than this.
variable-length ARRAYs - not allowed because it is variable length at each vertex

Computing Packed Types

This is the most involved phase of the algorithm. Essentially we want to go from a list of property types to a list of (packedType, channelCount: int, packingSteps: PackingFunction[]). This process varies depending on the destination (constant/uniform/attribute/texture), as WebGL has different rules for what types are allowed.

Packing functions are any steps that are needed to do to prepare the values for packing. They will be applied in order when packing, and the inverse will be performed in the shader to unpack the values.

Some packing types require a lossy conversions. We might want to log an error or throw an error when this happens.

Several types have similar packing rules, so here are some rules for converting these into a smaller set of types. These operations are added as packing rules. The following tables summarize these rules.

Notes:

To avoid recursive evaluation, conversions should be applied in the order listed.
In the tables, x is the number of bits (8, 16, 32, or 64) and N is the number of components (1-4).
ARRAY types will ultimately be implemented as a scalar/vector type for constants/uniforms/attributes, or as RGBA channels of a texture as necessary. However, for describing the type conversions, it's easiest to describe if everything is an array.

Constant/Uniform Type Conversions:

Type	Converted Type	Packing Function	Lossy
`AnyScalarType`	`ARRAY[AnyScalarType, 1]`	`promoteScalarToArray`	No
`ARRAY[INT(8/16), N]`	`ARRAY[INT32, N]`	`promoteToInt`	No
`ARRAY[INT64, N]`	`ARRAY[FLOAT32, N]`	`convertInt64ToF32`	Yes
`ARRAY[UINT(8/16), N]`	`ARRAY[UINT32, N]`	`promoteToUint`	No
`ARRAY[UINT64, N]`	`ARRAY[FLOAT32, N]`	`convertU64ToF32`	Yes
`ARRAY[FLOAT64, N]`	`ARRAY[FLOAT32, N]`	`convertF64ToF32`	Yes

At the end, only these families of types will remain: ARRAY[INT32, N], ARRAY[UINT32, N], ARRAY[FLOAT32, N], ARRAY[BOOLEAN, N]. They are translated to GLSL as following:

Type	GPU Types
`ARRAY[INT32, N]`	`int/ivec2/ivec3/ivec4`
`ARRAY[UINT32, N]`	`uint/uvec2/uvec3/uvec4`
`ARRAY[FLOAT32, N]`	`float/vec2/vec3/vec4`
`ARRAY[BOOLEAN, N]`	`bool/bvec2/bvec3/bvec4`

Attribute Type Converted:

Type	Converted Type	Packing Function	Lossy
`AnyScalarType`	`ARRAY[AnyScalarType, 1]`	`promoteScalarToArray`	No
`ARRAY[(U)INT(8/16), N]`	`ARRAY[FLOAT32, N]`	`convert(U)IntToF32`	No
`ARRAY[(U)INT(32/64), N]`	`ARRAY[FLOAT32, N]`	`convert(U)IntToF32Lossy`	Yes
`ARRAY[BOOLEAN, N]`	`ARRAY[FLOAT32, N]`	`reinterpretBooleanAsF32`	No
`ARRAY[FLOAT64, N]`	`ARRAY[FLOAT32, N]`	`convertF64ToF32`	Yes

At the end, only the ARRAY[FLOAT32, N] family of types will remain. They are translated as float/vec2/vec3/vec4 in GLSL

Texture Type Conversions

Type	Converted Type	Packing Function	Lossy
`AnyScalarType`	`ARRAY[AnyScalarType, 1]`	`promoteScalarToArray`	No
`ARRAY[(U)INT64, N]`	`ARRAY[FLOAT32, N]`	`convert(U)Int64ToF32`	Yes
`ARRAY[INTx, N]`	`ARRAY[UINTx, N]`	`reinterpretSignedAsUnsigned`	No
`ARRAY[BOOLEAN, N]`	`ARRAY[UINT8, N]`	`reinterpretBooleanAsU8`	No
`ARRAY[FLOAT64, N]`	`ARRAY[FLOAT32, N]`	`convertF64ToF32`	Yes

At the end, only these families of types will remain: ARRAY[UINT(8/16/32), N], ARRAY[FLOAT32, N]. The packed type is a little more involved, as it depends on whether float textures are supported via the OES_texture_float extension. Some types have a fallback when this is not available, others will throw errors. In some cases, more packing functions are needed.

Type	With `OES_texture_float`	`OES_texture_float` unavailable	Packing Functions
`ARRAY[FLOAT32, 1]`	FLOAT32 texture, 1 channel	UINT8 texture, 4 channels	`packFloatAsRGBA` (without float textures)
`ARRAY[FLOAT32, N]`	FLOAT32 texture, N channels	Unsupported	None
`ARRAY[UINT8, N]`	UINT8 texture, N channels	UINT8 texture, N channels	None
`ARRAY[UINT16, 1]`	FLOAT32 texture, 1 channel	UINT8 texture, 2 channels	`packUint16AsFloat32` or `packUint16As2Channels`
`ARRAY[UINT16, 2]`	FLOAT32 texture, 2 channels	UINT8 texture, 4 channels	`packUint16AsFloat32` or `packUint16As2Channels`
`ARRAY[UINT16, N]`	FLOAT32 texture, N channels	Unsupported	`packUint16AsFloat32`
`ARRAY[UINT32, 1]`	FLOAT32 texture, 1 channel (lossy)	UINT8 texture, 4 channels (not lossy)	`packUint32AsFloat32` or `packUint32AsRGBA`
`ARRAY[UINT32, N]`	FLOAT32 texture, N channels (lossy)	Unsupported	`packUint32AsFloat32`

Grouping Properties by Size

Note: in what follows, when I say "group properties" I am not referring to group metadata from 3DTILES_metadata, but grouping properties together by size for space efficiency.

The next step is to group properties together into a single texel/vector to conserve space.

Note: this step is optional, it should be controlled by a boolean flag. It's nice for memory efficiency, but will not be useful when interpolation is needed.

There are only 5 partitions of 4:

4
3 + 1
2 + 2
2 + 1 + 1
1 + 1 + 1 + 1

We can use this fact to pair up components to pack memory more densely:

(Textures only) - partition properties into properties packed as FLOAT32 textures and properties packed as UINT8 textures. The following steps will apply to each type of texture separately
Bin the properties by their number of components needed (1, 2, 3, or 4)
Add the list of 4-component properties to the output list
For each 3-component property, pair it with one of the 1-component properties (if available). Either way, add it to the output
For each 2-component property, pair it with either another 2-component property, or up to 2 1-component properties (where possible). Either way, add it to the output
Group the remaining 1-component properties in groups of 4 (as closely as possible) and add to the output.

For example, if I had (property, channels) = (A, 1), (B, 2), (C, 2), (D, 4), (E, 3), (F, 3), (G, 1), (H, 3), the algorithm would work like this:

After binning:
1: A, G
2: B, C
3: E, F, H
4: D

After handling 4-components:
output = [D]

After handling 3-components
output = [D, [E, A], [F, G], H]

(note that there's nothing to pair with H so the texel will have an unused 4th component)

After handling 3-components
output = [D, [E, A], [F, G], H, [B, C]]

After handling 1-components
output = [D, [E, A], [F, G], H, [B, C]] (no changes needed)

Compute Layouts

For uniforms, each group of properties becomes a single uniform.

For attributes, each group of properties becomes a single attribute.

For textures, it's a little more involved. Each group of properties becomes a single texel, but there are a couple different ways these texels can be arranged:

(my original idea) Each group of properties gets a number of rows of the texture propertyHeight = ceil(featureCount / textureWidth), then texels are accessed by

row = propertyIndex * propertyHeight + floor(featureId / textureWidth)
column = featureId % textureWidth`.

where propertyIndex would be computed for each property

(@lilleyse's suggestion) treat the properties as one big 1D array and wrap by the texture size:

index = propertyOffset + featureId
row = floor(index // textureWidth)
column = index % textureWidth

Where propertyOffset is computed for each property.

I think Option 2 is nicer for its simplicity and better memory efficiency for multiple feature tables.

NOTE: In the above, assume textures are the maximum size and 4 channels. The next step will handle shrinking this layout to fit the content tightly, this is to be done at the end.

"Vacuum Packing"

To finish the layout, we want to avoid wasting memory, so reduce dimensions of the data to fit the data as tight as possible. This involves:

(Textures only), balance the texture dimensions to minimize unused texels. If there are N texels in use, this can be done with the formula:

rows = ceil(N / maximumTextureWidth)
columns = ceil(N / rows)

For example, say maximumTextureWidth = 10 and N = 11, we have:

Original texture use: 10 x 2, 9 pixels wasted:
1111111111
100000000

rows = ceil(11/10) = 2
columns = ceil(11 / 2) = 6

Result: 6x2, only 1 texel wasted:
111111
111110

(Textures) Not sure if this step is needed depending on the texture layout used, but ensure the height of the texture is exactly enough to fit all the used texels.
Crop the number of channels if not all 4 are needed.
- Texture example: Suppose there is a texture with a single property that requires 2 components (such as an ARRAY[UINT8, 2]), use a LUMINOSITY_ALPHA texture rather than a RGBA texture
- Attribute/uniform example: Suppose we only need 2 components (e.g. 2 UINT8 properties packed tightly into a single attribute), use a vec2, not a vec4

ptrgags commented 3 years ago

Oh one clarification: when it comes to grouping properties by size, this needs to be done per-type. So for example, when it comes to textures, the FLOAT32 properties are grouped together, while the UINT32 ones are handled separately.

sanjeetsuhag commented 3 years ago

Learned this while reviewing https://github.com/CesiumGS/cesium/pull/9595 - WebGL 1 does not have uint because it uses GLSL 100. WebGL 2 supports uint because it uses GLSL 300.

lilleyse commented 1 year ago

Requested in https://github.com/CesiumGS/cesium/issues/11450.

chen21439 commented 1 year ago

Requested in #11450.

thank you for your reply. now i am trying to shake every building differently . i once thought the metadata seem a good choice to distinct them ,can you give me some advice on how i can distinct each building .

https://sandcastle.cesium.com/?src=Custom%20Shaders%203D%20Tiles.html&label=3D%20Tiles%20Next

in this example,a batch building share the same featureId

CesiumGS / cesium