How to minimize heap/stack memory usage of multiple contexts on mobile platform?

ARM-software / astc-encoder

The Arm ASTC Encoder, a compressor for the Adaptive Scalable Texture Compression data format.

https://developer.arm.com/graphics

Apache License 2.0

1.08k stars 241 forks source link

How to minimize heap/stack memory usage of multiple contexts on mobile platform? #461

Closed NoSW closed 5 months ago

NoSW commented 7 months ago

I'm using astcenc-encoder to compress volumetric lightmap data on mobile platform. I have selected two block sizes, 5x5 and 5x5x5, based on certain criteria.

When decompressing, it is required to create two different contexts simultaneously, or even four contexts if considering HDR/LDR. This results in a memory allocation of 30MB to 40MB, which is challenging to accept on mobile devices, especially on iOS.

I have noticed the build option ASTCENC_BLOCK_MAX_TEXELS, but the limit of 5x5x5=125 is still too large.

Q1: Is there any possibility to merge contexts with different block sizes and HDR/LDR settings into a big one?

Q2: Starting from #246, are there any opportunities to further reduce memory overhead?

(I have noticed WEIGHTS_MAX_DECIMATION_MODES, WEIGHTS_MAX_BLOCK_MODES, and BLOCK_MAX_WEIGHTS are still constant in v4.7.0.)

Q3: Can ASTCENC_BLOCK_MAX_TEXELS be changed to (c++) template?

solidpixel commented 7 months ago

Why do you need to decompress? Surely the point is to select formats the GPU can access natively in hardware.

Q1: Is there any possibility to merge contexts with different block sizes and HDR/LDR settings into a big one?

HDR is a superset of LDR, so you can use a HDR context to decompress LDR images already.

Merging block sizes won't help - you just make the context linearly bigger.

Q2: Starting from https://github.com/ARM-software/astc-encoder/issues/246, are there any opportunities to further reduce memory overhead?

Probably. PR's welcome ...

Q3: Can ASTCENC_BLOCK_MAX_TEXELS be changed to (c++) template?

No - it gets used by the preprocessor.

NoSW commented 7 months ago

Why do you need to decompress?

Not all GPUs implementing ASTC support the HDR profile.

Do you mean I should choose the format supported by the GPU rather than the specific astc block size?

Since the data is a Texture3DArray with dimensions of 5x5x5, there is no spatial continuity between them. So some formats may not be quite suitable, such as the 4x4 block size of BCn/ETC2. Therefore, I have chosen the astc 5x5 and 5x5x5 with a CPU decompressor.

HDR is a superset of LDR, so you can use a HDR context to decompress LDR images already.

👍

No - it gets used by the preprocessor.

Templates can be helpful in creating a 5x5 context with a smaller memory footprint if the ASTCENC_BLOCK_MAX_TEXELS=5x5x5 build option is enabled. However, it does not adhere to the API style of this library :(

solidpixel commented 7 months ago

Templates can be helpful in creating a 5x5 context with a smaller memory footprint

Using templated structures would change the size of the structs used in the context, so you'd need to build N templated versions of the codec, one per block size, so the code size would jump for the N combinations.

The bulk of the memory comes from the decimation tables, and then the partition tables. If you know which decimation modes and partitionings your textures actually use, the fastest way to reduce memory footprint is to filter out the creation of the entries you don't need at context creation time.

solidpixel commented 5 months ago

Old question, closing.