bitdefender / bddisasm

bddisasm is a fast, lightweight, x86/x64 instruction decoder. The project also features a fast, basic, x86/x64 instruction emulator, designed specifically to detect shellcode-like behavior.
Apache License 2.0
874 stars 112 forks source link

bddisasm leaks mnemonic string table #97

Closed ScimitarEnjoyer closed 2 weeks ago

ScimitarEnjoyer commented 1 month ago

The mnemonic string table gMnemonics is present even if BDDISASM_NO_FORMAT is defined.

Please consider adding a define that removes the mnemonics table entirely and sets the Instrux::Mnemonic field to ND_NULL (or an empty string). Having a numeric representation of the mnemonic (like ZydisMnemonic) alongside (or instead of) the text representation would be ideal.

static NDSTATUS
NdCopyInstructionInfo(
    INSTRUX *Instrux,
    ND_IDBE *Idbe
    )
{
    Instrux->Mnemonic = gMnemonics[Idbe->Mnemonic];
    ...
ianichitei commented 1 month ago

Hi!

There is already a unique ID for each instruction: INSTRUX::Instruction. Note that these values may change between bddisasm versions.

I assume you'd like to compile out the strings in order to save memory? Setting Instrux->Mnemonic to NULL in that case isn't ideal IMO, as currently it is guaranteed to never be NULL, and breaking that guarantee seems problematic. Maybe setting it to a placeholder string (even "") would be better.

ScimitarEnjoyer commented 1 month ago

Hello :)

Yeah, I wanted to remove the strings from the binary and I think this would be nice to have in the official distribution.

Regarding the mnemonics, my suggestion was having the index into the table be defined as an enumeration (and saved in the INSTRUX), so users can still work with the mnemonic if needed without needing to have the entire string table in the binary.

vlutas commented 3 weeks ago

We could add a define to exclude the mnemonics table, but this would also exclude the Mnemonic field (we can't keep it, as it may be used, which would lead to nasty bugs). Excluding the Mnemonic field in this case should be fine, as it would be a new feature (behind a new define), and whomever would use this new feature, would have to take into consideration that no textual mnemonic representation is available. Note that in practice, the textual mnemonic should not be used to inspect the instruction (it's merely an informative field). In order to determine what instruction you're dealing with, the Instruction field should always be used. It's values are from an enum that is auto-generated, and contains all the instruction mnemonics (but this also means, as ianichitei pointed out, that the values may change from one build to another). Perhaps we could create an additional enum with static values, which won't change across builds, but this is an endeavor for another time.

ScimitarEnjoyer commented 3 weeks ago

Removing the Mnemonic field from the struct seems like the ideal solution.

I'm using the Instruction field, but I wasn't sure if there's a case where someone might need the mnemonic itself (besides for formatting the textual representation of the instruction).

akisari commented 2 weeks ago

The BDDISASM_NO_MNEMONIC define was added in commit 4026e76 to no longer include information about mnemonics.