Vector35 / binaryninja-api

Public API, examples, documentation and issues for Binary Ninja
https://binary.ninja/
MIT License
849 stars 194 forks source link

Mixing backtick escapes and macro token pasting in C source type definition causes a freeze #5696

Open Frizi opened 1 week ago

Frizi commented 1 week ago

Version and Platform (required):

Bug Description: When using "Create Types from C Source" function, attempting to define escaped "template-like" type names using macros will throw binary ninja into "not responding" state, forcing a process kill.

Steps To Reproduce: Please provide all steps required to reproduce the behavior:

  1. Use "Create Types from C Source" function
  2. Provide follwing type definition:
#define K_TY int32_t
#define V_TY int16_t

#define ENTRY_TY(K,V) `map_entry<`##K##`,`##V##`>`

struct ENTRY_TY(K_TY,V_TY)
{
    K_TY key;
    V_TY value;
};
  1. Click on "Create"
  2. Binaryninja freezes

Expected Behavior: A type named map_entry<int32_t,int16_t> should be created.

Binary: Reproducible on any binary, including fresh "New Binary Data" context.

CouleeApps commented 1 week ago

This is a very clever idea that I had not considered, and unfortunately doesn't work as-is due to a couple factors:

  1. Obviously the hang, which is due to how our clang patch lexes the backticks. That was not too hard to fix locally, but got stuck by the following...
  2. Trying to concatenate backtick identifiers with ## doesn't actually strip the backticks. I got a type named `map_entry<\`int32_t\`,\`int16_t\`>` after fixing the hang
  3. The backtick identifiers don't actually have concatenation semantics similar to C strings, since they are not strings. So even without a macro, `foo``bar` does not become foobar because backticks don't behave like quotes
  4. And as a result of this, trying to concatenate anything other than identifiers leads to preprocessing errors since you cannot concatenate keywords with identifiers like that, so swapping out the int32_t for int yielded error: pasting formed '`map_entry<`int', an invalid preprocessing token

So overall this would have been a neat trick to get pseudo template-style declarations with the type system, but trying to use the backticks made it too fancy for our basic clang patches.

However! If you don't strictly need the angle brackets/commas, you can just use regular C preprocessor syntax and get the behavior you seem like you're trying to get in the first place:

#define K_TY int32_t
#define V_TY int16_t

#define ENTRY_TY_(K,V) map_entry_ ##K## _ ##V
// You need to use indirection here so it resolves the K_TY typedef
#define ENTRY_TY(K,V) ENTRY_TY_(K,V)

struct ENTRY_TY(K_TY,V_TY)
{
  K_TY key;
  V_TY value;
};

...which parses to...

struct map_entry_int32_t_int16_t
{
    int32_t key;
    int16_t value;
};

...and that seems like roughly what you were going for, just slightly less fancy.

So with that said, I'll get the hang fix pushed soon but probably can't get the identifier concatenation working as you would expect, at least not quickly.

Frizi commented 1 week ago

I am aware of the less-fancy workaround, though I do have a lot of "fancy" types already defined and wanted to continue in the same fashion. So far I'm simply manually concatenating the typenames, but it would be really neat to easily automate that. Seems like python scripting will be the way forward for now.

The backtick identifiers don't actually have concatenation semantics similar to C strings, since they are not strings.

Token pasting behavior seems to be very distinct from automatic string literal concatenation (at least on the surface, I'm not familiar with the implementation details). Could it maybe be extended to handle backticked tokens in special way?

Of course the goal is to simply create "forbidden" tokens by concatenation. Any syntax that would allow that would be perfectly adequate. I've also tried something along the lines of "template literals", hoping to find some not yet documented behaviors:

#define ENTRY_TY(K,V) `map_entry<${K},${V}>`

Maybe that approach would somehow be easier to implement?