NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.5k stars 5.77k forks source link

Swift projects expand 4-byte values to 8 bytes when building structs #6784

Open nmggithub opened 1 month ago

nmggithub commented 1 month ago

Describe the bug This is probably just some flag or feature I am unaware of, but I am reversing a binary and trying to add a struct and I experience this behavior. This does not happen in most other binaries I've worked with.

To Reproduce Steps to reproduce the behavior:

  1. Open the Structure Editor and create a new struct
  2. Add an item of the type uint32_t
  3. Observe the popover stating that the size is 4 bytes
  4. Press enter
  5. Observe it being added with a size of 8 bytes.

Expected behavior The item is added with a size of 4 bytes.

Screenshots

Screenshot 2024-08-03 at 18 17 03

Attachments N/A

Environment (please complete the following information):

Additional context I've tried playing around with alignment, but it doesn't seem to do anything. The struct in the screenshot has the alignment set to 4.

dev747368 commented 1 month ago

What size does uint32_t take when you use it directly in the program (no struct)? What arch / language is your binary? If you find the uint32_t type in the data type manager tree, where did it come from (is it linked to a data type archive, and what path does the data type live in) and what underlying type is it pointed at?

nmggithub commented 1 month ago

What size does uint32_t take when you use it directly in the program (no struct)? What arch / language is your binary? If you find the uint32_t type in the data type manager tree, where did it come from (is it linked to a data type archive, and what path does the data type live in) and what underlying type is it pointed at?

For uint32_t, there appear to be three data type archives that contain it:

  1. The binary's own data type archive (or at least one named after the program name)
  2. generic_clib_64
  3. mac_osx

I think I generated that third one through I header file I got from GitHub. Note that, again, this isn't happening with every binary. Just this one specific macOS binary I'm looking at.

When using the data type in the program without a struct, those three locations show in the dropdown in the Data Type Chooser Dialog. The latter two say they are 4 bytes long, but the one in the project's archive says it is 8 bytes long. If I select any of them and then mouse over the result, it says the value is 8 bytes long.

EDIT: It appears that mac_osx is actually built in. To note, I did, several versions ago, use the "Parse C Source" method to parse some additional macOS types. I was following the instructions in the README where I got them on GitHub. I'll try to find where that was.

EDIT 2: Ok it was this, I believe: https://github.com/PoomSmart/IDAObjcTypes. I am not sure if this affects anything. Again, it's only happening with this one binary.

dev747368 commented 1 month ago

Right. Well, the base data type that this typedef is pointing to, in conjunction with this binaries arch is probably the cause.

Some of Ghidra's built-in data types are specific to the arch/compiler that was assigned to the binary during import. If the uint32_t typedef is pointing to one of these (ie. the base built-in data type called int) instead of a statically sized base type (ie. dword), and then you transport that data type from the original context to another binary, and your new binaries arch/compiler spec defines int as 8 bytes instead of 4, you can run into this situation.

From a quick look, I'm guessing your binary is swift which defines int as 8 bytes, but the source of the typedef was created using a 4-byte int compiler spec.

nmggithub commented 1 month ago

Honestly, that Swift theory sounds like it could be it. However, I've reversed this binary before (a previous version) and didn't have this issue. I've also reversed other Swift binaries without issue. Granted all this was also on previous versions of Ghidra.

Regardless, is there a way to tell Ghidra that int is actually 4 bits (and also potentially fix any other base types)?

dev747368 commented 1 month ago

Regardless, is there a way to tell Ghidra that int is actually 4 bits (and also potentially fix any other base types)?

You can't modify the behavior of the data type called int, but you can modify the typedef to point to something else, like dword.(via a complicated series of steps using the right-click, Replace... action to pick a second typedef that you previously created that was setup the correct way)

If you hover over a data type, the tooltip that pops up should state if its compiler-specific size, or if not mentioned, it will be a statically sized type.

ryanmkurtz commented 1 month ago

If you do Help -> About <program>, what is the value of Compiler ID? Indeed, I made int 8 bytes for Swift programs. I wouldn't have expected that to make uint32 8 bytes as well though.

nmggithub commented 1 month ago

Compiler ID is indeed: swift. I am still caught up, though, on my ability to reverse previous versions of this binary (and also other Swift binaries) just fine. The more I think about it though, I wasn't really using structs that much in the others.

Another confusing part now is that I was trying to use the structs to define parts of memory, but the sizing was messing with it. Or, in short: there's memory in the binary that's laid out according to a typedef where int is 4 bytes. I'm honestly not sure how that's possible, but it's probably some deep compiler/linker magic.

dev747368 commented 1 month ago

I think support for swift binaries was added fairly recently to Ghidra, so this data type size mismatch may be a new issue for those binaries vs. the same binary imported using a generic AARCH64 definition.

nmggithub commented 1 month ago

I think support for swift binaries was added fairly recently to Ghidra, so this data type size mismatch may be a new issue for those binaries vs. the same binary imported using a generic AARCH64 definition.

Ok yeah, this makes sense and is probably what's happening. Given that I was able to reverse these binaries just fine before under the generic AARCH64 definition, is it possible to force Ghidra to revert to that? Also, what, if anything, does the new Swift support add? This it the first time I actually have noticed it and it's causing me issues.

dev747368 commented 1 month ago

If you are okay with re-importing the binary, you can just change the "Language" field before clicking ok. It should pop up a table of arch/compiler combos (and there is a check box at the bottom to let you force something non-recommended).

nmggithub commented 1 month ago

Nice, thank you! You mention "non-recommended", though. is there any recommended way to use typedefs and structs from another compiler spec in binaries like this? Or are cases like this (where a binary has memory laid out based on such a typedef) rare?

dev747368 commented 1 month ago

The "non recommended" was a reference to the ability to choose an arbitrary cpu arch/compiler during import, even if its incorrect.

As far as recommended ways of reusing type info across arch/compiler specs, dunno.

Ghidra's existing bundled data type archives have this typedef issue, probably because they were generated via parsing .h files. How often these types are used in other type declarations will be up to the source of the imported type info.

You can easily add your own types, even with the same name, but you need to be careful about picking the correct one when using them. You can also overwrite those existing bad types with your own correct definition. (see my previous comment about using the right click, Replace... feature).

If you end up putting some effort into creating type info for your binary, you also may want to save your types into their own data type file so you can reuse it later.

dev747368 commented 1 month ago

I'm only seeing 2 cspec's that have integer_size=8: swift and golang (on 64 bit archs).

Everything else is 4 bytes, except for the obvious 16 bit platform cspecs that have a 2 byte int.

nmggithub commented 1 month ago

Ok so, I just want to clarify my situation as it stands:

Previously, before this Swift support was added to Ghidra, I could load this binary into Ghidra. I could then type a region of memory to a well-known struct which relied on int being 4 bytes long. It worked.

Now, this breaks down because Ghidra assumes int is 8 bytes due to the inferred compiler of the binary. However, the memory region of the binary has not changed. There is still memory that is laid out according to the struct as if int were 4 bytes instead of 8.

Is Ghidra then wrong for inferring that int is 8 bytes long? If this binary was indeed compiled with a compiler in which int is 8 bytes, why is that memory laid out in the binary according to the 4-byte-length layout? Was that potentially some compiler and/or linking magic?

And what would be your recommended solution here? Right now I see two options:

  1. Import as a generic AARCH64 binary, or
  2. Recreate the struct

Option 2 seems like the least invasive option, but the work is non-trivial, as the struct itself actually references several other structs, ones I would likely have to recreate as well. That makes me want to go for Option 1, but I'm not sure if that would break anything else in the program. However, given that I've apparently reversed this binary before as a generic AARCH64, I may be fine.

And, I guess as a final question (repeated from earlier): what did/does this Swift support actually do? Because right now I've only seen it mess up my setup. What are the benefits?

dev747368 commented 1 month ago

Is Ghidra then wrong for inferring that int is 8 bytes long? If this binary was indeed compiled with a compiler in which int is 8 bytes, why is that memory laid out in the binary according to the 4-byte-length layout? Was that potentially some compiler and/or linking magic?

I'm not knowledgeable about swift, but I am about golang and it may be a good analog.

By default, a RE'd golang binary typically won't benefit from type info imported from a C .h file that might include ints and uint32_t, etc.

However, that could change if the golang binary statically links in a C library. All of the sudden, in 1 binary you've got non-homogeneous definitions (at a source-code level) of what an int is, and ghidra only allows you to specify a single compiler spec. for the entire binary.

re: option 1 or 2... or leave the struct alone and just change the problematic types it references (if they are all typedefs).

nmggithub commented 1 month ago

Thank you for the note. I'll keep this in mind. I think I'm gonna go with option 1 as it just brings me back to what I was doing before. Given that I think we have narrowed down the root cause and a mitigation, I'm going to close this issue. Thank you so much for your help!

nmggithub commented 1 month ago

@ryanmkurtz Just wanted to ping you here so you can look through the past conversation. It does, indeed, appear to be an issue with the Swift compiler inference. Importing as a default AARCH64 binary works fine, as it appears to just do what it did before.

Indeed, I made int 8 bytes for Swift programs. I wouldn't have expected that to make uint32 8 bytes as well though.

That does seem to be what it's doing. If that's unintentional, you may want to take a look at it. Anyway, I'm still keeping this closed as I have found my mitigation, but it seems that there might be work to be done in regards to Swift support (but I'll leave that to you). Let me know if you have any question for me and I can try and provide answers.

ryanmkurtz commented 1 month ago

Yes indeed, i will take a look. Swift support adds a couple things. I'll paste this in from the "What's New" we released with Ghidra 11.1:

Initial support for binaries written in the Swift Programming Language has been added. The new support relies on the native Swift demangler being present on the user's system. Swift is automatically bundled with XCode on macOS, and can be optionally installed on Windows and Linux. See the "Demangler Swift" analyzer options for more information. Type information gathered from the demangled Swift symbol names is used to create corresponding Ghidra data types. This currently works for Swift primitives and structures, but more work needs to be done to include classes and other advanced data types. Swift-specific calling conventions are also applied to demangled Swift functions.

The primitive sizes and calling conventions are defined in the x86-64-swift.cspec and AARCH64_swift.cspec text files. At the top of these files is the <data organization> section, where integer_size is defined: https://github.com/NationalSecurityAgency/ghidra/blob/1baf101d43379336d6a9dc0f6da803f946939a40/Ghidra/Processors/AARCH64/data/languages/AARCH64_swift.cspec#L3-L14

Why did I define integer to be 8 bytes? For no other reason that an int in Swift is 8 bytes. I admittedly did not know about this uinit32_t problem though at the time.

What memory are you laying down structures on? Are they things generated by Swift, or some other Objective C or Mach-O thing? In those contexts, I could see an 8 byte integer as being undesirable.

You have the power to modify that .cspec file and make integer 4 bytes. When you restart Ghidra, it should take effect. However, some changes to the Java code would need to take place so 8 byte longs could be used to represent Swift.Int: https://github.com/NationalSecurityAgency/ghidra/blob/1baf101d43379336d6a9dc0f6da803f946939a40/Ghidra/Features/SwiftDemangler/src/main/java/ghidra/app/util/demangler/swift/nodes/SwiftStructureNode.java#L61-L79

This is really the first kind of feedback i've received on the Swift stuff (positive or negative) since its release, so I was expecting to have to adjust things as more test cases rolled in.

As for why Ghidra shows you 4 bytes on hover instead of 8, that seems like a bug to me, personally.

nmggithub commented 1 month ago

What memory are you laying down structures on? Are they things generated by Swift, or some other Objective C or Mach-O thing? In those contexts, I could see an 8 byte integer as being undesirable.

I'm not 100% sure what generates it, but I know it's a data structure from an old legacy feature in macOS. Probably a C file (not even Objective-C) that's just linked in with the Swift files.

As for why Ghidra shows you 4 bytes on hover instead of 8, that seems like a bug to me, personally.

Sorry, to clarify, hover works fine. It's just that, in any data type dropdown (such as in the Struct Editor, or the code editor) it shows the three copies of uint32_t: program archive, generic_clib_64 archive, mac_osx archive; and only the program one shows the true size of 8 bytes (but if either of the other two are selected it still seems to choose the 8-byte one from the program archive).

Why did I define integer to be 8 bytes? For no other reason that an int in Swift is 8 bytes. I admittedly did not know about this uinit32_t problem though at the time.

I would agree that and Int is 8 bytes on 64-bit platforms, but Swift, to my knowledge, is still designed with 32 bit systems in mind as well:

On 32-bit platforms, Int is the same size as Int32, and on 64-bit platforms, Int is the same size as Int64. Source: https://developer.apple.com/documentation/swift/int

I'm not sure how that affects things.