BinomialLLC / basis_universal

Basis Universal GPU Texture Codec
Apache License 2.0
2.73k stars 267 forks source link

Precompute g_bc7_mode_N tables for solid color blocks #384

Open zeux opened 3 weeks ago

zeux commented 3 weeks ago

uastc_init takes 140ms in Chrome; this delays time-to-first-paint with textures by 140ms per worker. The tables are just 2KB and 1KB respectively; additionally, errors for mode 5 are all 0, and errors for mode 6 are all 0 except for two values (0 and 255), so we can avoid storing the errors and compute them from color values. This allows us to cut both tables in half (1KB and 0.5KB respectively).

This reduces uastc_init cost to 30ms. The .wasm file goes from 525 KB to 526 KB with this change (846 extra bytes of overhead).

Not sure if this is similar to any existing pattern; feel free to close this if there's a better way to do this.

zeux commented 3 weeks ago

Actually, also, BC7 mode 5 can perfectly encode any solid color block, right? So you shouldn't even need mode 6 selection in the first place. This would simplify the code further. I'm going to add a separate commit that does this.

zeux commented 3 weeks ago

With removal of mode 6 table, the .wasm binary is 525 KB (-516 bytes from master).

lexaknyazev commented 3 weeks ago

FYI, #232.

zeux commented 3 weeks ago

Yeah that's also a good solution, but we'd still need to remove the mode 6 table (as it's slow to compute - iirc it was ~35ms for mode 5 and ~70ms for mode 6 - and not useful), or precompute it like this PR does, minus the second commit.

lexaknyazev commented 3 weeks ago

Mode 6 path was added for very specific non-web use cases. It could be wrapped with macros when compiling with emscripten to not remove that functionality altogether.

The expressions in #232 give exact Mode 5 endpoints for any solid color so neither static tables nor init routines are needed.

zeux commented 3 weeks ago

I agree you shouldn't need tables here in the first place for mode 5. As for mode 6, I've read the linked issue and I am not sure I understand the value (why would LZ be able to identify shared patterns between a solid color mode 6 block and some other mode 6 block?), but if there's data that suggests that this is necessary for some other use case, the mode 6 table could be kept precomputed, as the first commit in this PR does, possibly with a define to keep that optional/opt-in.