This is a rewrite of LZSA1JMP.ASM to use a 256-element jumptable, which
allows the code to handle all of the hot paths (common cases) without
any branching. This not only reduces branches (which are very costly on
x86) to a bare minimum, but also grants us foreknowledge in a decode
path of what steps can be skipped.
The new code is 12.7% faster than the old code, and assembles to less
than 3K of object code and data.
This is a rewrite of LZSA1JMP.ASM to use a 256-element jumptable, which allows the code to handle all of the hot paths (common cases) without any branching. This not only reduces branches (which are very costly on x86) to a bare minimum, but also grants us foreknowledge in a decode path of what steps can be skipped.
The new code is 12.7% faster than the old code, and assembles to less than 3K of object code and data.