Optimize NdParseOperand

bitdefender / bddisasm

bddisasm is a fast, lightweight, x86/x64 instruction decoder. The project also features a fast, basic, x86/x64 instruction emulator, designed specifically to detect shellcode-like behavior.

Apache License 2.0

888 stars 115 forks source link

Optimize NdParseOperand #102

Open turol opened 2 months ago

turol commented 2 months ago

NdParseOperand is taking up a lot of cycles. This seems to be because of the unpredictable switches. This is an attempt to make it faster by handling "regular" cases with arrays. It makes the branches more predictable but only has a slight effect on performance.

    N           Min           Max        Median           Avg        Stddev
x  30      16655663      18165255      18069974      18030108     265051.76
+  30      17821354      18267074      18140863      18136979     93104.058
Difference at 95.0% confidence
    106871 +/- 102683
    0.592735% +/- 0.569509%
    (Student's t, pooled s = 198646)
Instructions/second, higher is better

It might be possible to speed it up more by re-numbering ND_OPERAND_SIZE_SPEC and ND_OPERAND_TYPE_SPEC so the regular cases are either first or last.

turol commented 2 months ago

Looks like MSVC doesn't like the empty initializers. I don't have MSVC to test with, what kind of syntax does it want?

ianichitei commented 2 months ago

Looks like MSVC doesn't like the empty initializers. I don't have MSVC to test with, what kind of syntax does it want?

It should work with { 0 } instead of { }.

That table looks like it might be auto-generated by isagenerator, will have to see what @vlutas thinks about this.

vlutas commented 2 months ago

There are several problems with this change:

it makes the code very hard to follow; operands processing should be as homogenous as possible
each time a new operand size or type is added, we'd have to make sure to update the array(s) as well
the internal operand type & size enums are exactly that - internal - and they can change any time; changing them would require the arrays to also be modified (although this could be mitigated via designated initializers)
it doesn't remove the switch statements - it only introduces new arrays, as a potential for cache misses

This change essentially just rewrites existing code in a different way, with no clear benefit. Like I mentioned in previous PRs, significant changes will only be accepted if they bring a significant improvement.

turol commented 2 months ago

It doesn't remove the switches but it does make them somewhat more predictable. This should help the branch predictor.

The code is pretty well optimized already so to me 1-2% speedups are significant.

vlutas commented 2 months ago

Unfortunately, the improvement is nowhere nearly enough to justify such a significant change.