Closed dneto0 closed 6 years ago
Good stuff! In my mind, there's several goals, and several approaches here:
Given a SPIR-V program:
I'm playing around with various approaches for the last item -- the "varint" encoding you mentioned, also gonna try delta-encoding the IDs (in almost all the programs I have here, majority of IDs are 1-2 away from IDs used in previous instruction).
I have also seen some fairly unexpected stats from the shaders I have, e.g. OpVectorShuffle takes up a lot total space - it's a very verbose encoding (very often 9 words), in most cases just representing a swizzle or a scalar splat. Have to dig in more to find whether there are similar patterns that could be either encoded more compactly, or encoded in a way that's more compressible.
With shader variants, you'd probably find that the same basic blocks are found over and over, with maybe just shifted IDs. Could be interesting to have an archive format where SPIR-V files would do something like OpLabelLink $hash $id-shift, and basic blocks could be inlined in runtime before passing to driver.
Experimenting with varint encoding + some delta encoding. Results look quite promising. Messy code on github (https://github.com/aras-p/smol-v -- Win/Mac builds)
But, testing on 113 shaders I have right now (caveat emptor: all produced by HLSL -> d3dcompiler -> DX11 bytecode -> HLSLcc -> GLSL -> glslang, so they might have some patterns that aren't "common" elsewhere):
Compression: original size 1314.8KB
0 Remap 1314.1KB 99.9% glslang spirv-remap
0 SMOL-V 448.3KB 34.1% "my stupid code"
Compressed with LZ4 HC compressor at default settings ("re" = remapper, "sm" - my test):
1 LZ4HC 329.9KB 25.1%
1 re+LZ4HC 241.8KB 18.4%
1 sm+LZ4HC 128.0KB 9.7%
Compressed with Zstd compressor at default settings:
2 Zstd 279.3KB 21.2%
2 re+Zstd 188.7KB 14.4%
2 sm+Zstd 117.4KB 8.9%
Compressed with Zstd compressor at almost max setting (20):
3 Zstd20 187.0KB 14.2%
3 re+Zstd20 129.0KB 9.8%
3 sm+Zstd20 92.0KB 7.0%
There's a lot of more instructions I could be encoding (so far just looked at the ones taking up most space), and perhaps other tricks could be done. The shaders I have do have debug names on them, I am not stripping them out.
(edit: updated with August 28 results)
@aras-p Very promising results! Thanks for sharing. Agreed: Transforms should be designed to make the result more compressible by standard (universal) compressors.
I also agree that we have to be mindful of using a reasonable tuning set of shaders. Here's a good project for someone: make a public repository of example shaders and define meaningful tuning sets over them.
Other encoding ideas:
%a = ...
....
...
%b = ....
%sum = OpIAdd %int %a %b
into
%a = ...
...
%b = ...
%sum = OpImplicitIAdd %int %a ; will automatically use %b as the other operand
(This reminds of life writing assembly for the 6502.) The idea here is that instead of encoding the operand explicitly (even with delta coding), it just bloats the instruction opcodes slightly.
Good stuff, thanks.
I also want to put in a plug for another dimension to generate less SPIR-V to begin with.
There are two big ones SPIR-V is designed for:
For the first, if multiple GLSL shaders are being generated with different values of constants (e.g., fixed sets of elements to process or bool turning features on/off), it is possible to instead make one GLSL shader with specialization constants and wait until app run time to provide the actual constant values needed:
shader A:
const elements vec4[4];
const bool feature = false;
... for (i = 0; i < 4; ++i) ...elements[i]...
... if (feature) ...
shader B:
const elements vec4[8];
const bool feature = true;
... for (i = 0; i < 8; ++i) ...elements[i]...
... if (feature) ...
Single shader with specialization:
layout(constant_id=1) const int numElements = 4; // can be changed to 8 at run time
const elements vec4[numElements];
layout(constant_id=2) const bool feature = false; // can be changed to true at run time
... for (i = 0; i < numElements; ++i) ...elements[i]...
... if (feature) ...
This turns multiple shaders into a single shader, long before compression even comes into play.
For the second point, @dneto0 already touched on it with:
Link shaders together into a single SPIR-V module, to share common declarations (like types), and share helper function bodies.
It's possible that enough ID remapping and cross-file compression would recognize the commonality (would be good to find out how much that is happening), but if not two other approaches would help:
As a place to look for inspiration the WebAssembly group may have some relevant ideas. They've heavily iterated on efficient instruction encodings that are easy to parse and compress well. They ended up with LEB128 varint encoding for most things, as well as some ways of reducing long instruction encoding (like the 9-word swizzle mentioned above). Some docs here, but if anyone's interested they could ping the group and chat - I'm sure they'd be willing to share what they learned along the way :)
From @aras-p:
...several approaches here: Given a SPIR-V program:
- Make it more compressible, while still keeping it a valid SPIR-V that does the same thing. This is what spirv-remap does.
This is an important design constraint to be aware of. The question is whether anything not "off the shelf" is needed on the target (end user) system. Applies to both decompression and denormalization.
Also key is whether multiple files are seen just at compression time, or earlier at normalization/remapping time.
The fuller taxonomy is more like:
All combinations make sense. The remapper was indeed intentionally targeting the combination of
These are all constraints, and certainly lifting any of them would enable a tool to perform better.
So, I'm curious to what extent gains were seen by lifting the constraints and to what extent by finding more ways of doing better normalization.
I wrote up what I did so far here: http://aras-p.info/blog/2016/09/01/SPIR-V-Compression/
And indeed, the combination I chose is somewhat different from the remapper. I did this:
Now, my "normalization/denormalization" step also makes it smaller, so you could view it as some sort of compression too. But it's not a dictionary/entropy compression, so you can still compress it afterwards with regular off-the shelf compressors.
Nice write-up, thanks.
@atgoo is contributing a codec to SPIRV-Tools. It's work-in-progress (in source/comp and tools/comp) and I've seen internal reports of quite good results.
@dneto, @atgoo: My impression is that the work for compression is basically complete right now. Should we close this?
I would wait for feedback before doing any non-bugfix work, so it could be called completed.
Okay, I'll close this then.
We've heard reports that SPIR-V modules are larger than those compiled to other representations.
First, SPIR-V binary encoding is extremely regular and is designed to be very simple to handle. It has lots of redundancy. For example, the SPIRV-Tools binary parser is simple and nearly stateless.
Second, Glslang generates binaries with OpName for many objects (see https://github.com/KhronosGroup/glslang/issues/316) Also, it doesn't attempt to use group decorations.
To make smaller binaries, we need to make tools smarter: Emit less redundant info in the first place, make tools to eliminate redundancy (but still produce valid SPIR-V binaries), and make semantically lossless compression and decompression.
This issue is a brain dump of a few ideas along these lines. (Keep in mind that the SPIRV-Tools must remain unencumbered, including a possible relicensing under the Apache 2 license.)
Random ideas include those that leave the result as valid SPIR-V binary:
Generic compression ideas:
Low level encoding ideas (stateless):
Stateful encoding:
Anyway, this is just a start of what we could do.