Open nmggithub opened 1 month ago
What size does uint32_t
take when you use it directly in the program (no struct)?
What arch / language is your binary?
If you find the uint32_t type in the data type manager tree, where did it come from (is it linked to a data type archive, and what path does the data type live in) and what underlying type is it pointed at?
What size does
uint32_t
take when you use it directly in the program (no struct)? What arch / language is your binary? If you find the uint32_t type in the data type manager tree, where did it come from (is it linked to a data type archive, and what path does the data type live in) and what underlying type is it pointed at?
For uint32_t
, there appear to be three data type archives that contain it:
generic_clib_64
mac_osx
I think I generated that third one through I header file I got from GitHub. Note that, again, this isn't happening with every binary. Just this one specific macOS binary I'm looking at.
When using the data type in the program without a struct, those three locations show in the dropdown in the Data Type Chooser Dialog. The latter two say they are 4 bytes long, but the one in the project's archive says it is 8 bytes long. If I select any of them and then mouse over the result, it says the value is 8 bytes long.
EDIT: It appears that mac_osx
is actually built in. To note, I did, several versions ago, use the "Parse C Source" method to parse some additional macOS types. I was following the instructions in the README where I got them on GitHub. I'll try to find where that was.
EDIT 2: Ok it was this, I believe: https://github.com/PoomSmart/IDAObjcTypes. I am not sure if this affects anything. Again, it's only happening with this one binary.
Right. Well, the base data type that this typedef is pointing to, in conjunction with this binaries arch is probably the cause.
Some of Ghidra's built-in data types are specific to the arch/compiler that was assigned to the binary during import. If the uint32_t
typedef is pointing to one of these (ie. the base built-in data type called int
) instead of a statically sized base type (ie. dword
), and then you transport that data type from the original context to another binary, and your new binaries arch/compiler spec defines int
as 8 bytes instead of 4, you can run into this situation.
From a quick look, I'm guessing your binary is swift which defines int
as 8 bytes, but the source of the typedef was created using a 4-byte int compiler spec.
Honestly, that Swift theory sounds like it could be it. However, I've reversed this binary before (a previous version) and didn't have this issue. I've also reversed other Swift binaries without issue. Granted all this was also on previous versions of Ghidra.
Regardless, is there a way to tell Ghidra that int
is actually 4 bits (and also potentially fix any other base types)?
Regardless, is there a way to tell Ghidra that
int
is actually 4 bits (and also potentially fix any other base types)?
You can't modify the behavior of the data type called int
, but you can modify the typedef to point to something else, like dword
.(via a complicated series of steps using the right-click, Replace... action to pick a second typedef that you previously created that was setup the correct way)
If you hover over a data type, the tooltip that pops up should state if its compiler-specific size, or if not mentioned, it will be a statically sized type.
If you do Help -> About <program>
, what is the value of Compiler ID
? Indeed, I made int
8 bytes for Swift programs. I wouldn't have expected that to make uint32
8 bytes as well though.
Compiler ID is indeed: swift
. I am still caught up, though, on my ability to reverse previous versions of this binary (and also other Swift binaries) just fine. The more I think about it though, I wasn't really using structs that much in the others.
Another confusing part now is that I was trying to use the structs to define parts of memory, but the sizing was messing with it. Or, in short: there's memory in the binary that's laid out according to a typedef where int
is 4 bytes. I'm honestly not sure how that's possible, but it's probably some deep compiler/linker magic.
I think support for swift binaries was added fairly recently to Ghidra, so this data type size mismatch may be a new issue for those binaries vs. the same binary imported using a generic AARCH64 definition.
I think support for swift binaries was added fairly recently to Ghidra, so this data type size mismatch may be a new issue for those binaries vs. the same binary imported using a generic AARCH64 definition.
Ok yeah, this makes sense and is probably what's happening. Given that I was able to reverse these binaries just fine before under the generic AARCH64 definition, is it possible to force Ghidra to revert to that? Also, what, if anything, does the new Swift support add? This it the first time I actually have noticed it and it's causing me issues.
If you are okay with re-importing the binary, you can just change the "Language" field before clicking ok. It should pop up a table of arch/compiler combos (and there is a check box at the bottom to let you force something non-recommended).
Nice, thank you! You mention "non-recommended", though. is there any recommended way to use typedefs and structs from another compiler spec in binaries like this? Or are cases like this (where a binary has memory laid out based on such a typedef) rare?
The "non recommended" was a reference to the ability to choose an arbitrary cpu arch/compiler during import, even if its incorrect.
As far as recommended ways of reusing type info across arch/compiler specs, dunno.
Ghidra's existing bundled data type archives have this typedef issue, probably because they were generated via parsing .h files. How often these types are used in other type declarations will be up to the source of the imported type info.
You can easily add your own types, even with the same name, but you need to be careful about picking the correct one when using them. You can also overwrite those existing bad types with your own correct definition. (see my previous comment about using the right click, Replace... feature).
If you end up putting some effort into creating type info for your binary, you also may want to save your types into their own data type file so you can reuse it later.
I'm only seeing 2 cspec's that have integer_size=8: swift and golang (on 64 bit archs).
Everything else is 4 bytes, except for the obvious 16 bit platform cspecs that have a 2 byte int.
Ok so, I just want to clarify my situation as it stands:
Previously, before this Swift support was added to Ghidra, I could load this binary into Ghidra. I could then type a region of memory to a well-known struct which relied on int
being 4 bytes long. It worked.
Now, this breaks down because Ghidra assumes int
is 8 bytes due to the inferred compiler of the binary. However, the memory region of the binary has not changed. There is still memory that is laid out according to the struct as if int
were 4 bytes instead of 8.
Is Ghidra then wrong for inferring that int
is 8 bytes long? If this binary was indeed compiled with a compiler in which int
is 8 bytes, why is that memory laid out in the binary according to the 4-byte-length layout? Was that potentially some compiler and/or linking magic?
And what would be your recommended solution here? Right now I see two options:
Option 2 seems like the least invasive option, but the work is non-trivial, as the struct itself actually references several other structs, ones I would likely have to recreate as well. That makes me want to go for Option 1, but I'm not sure if that would break anything else in the program. However, given that I've apparently reversed this binary before as a generic AARCH64, I may be fine.
And, I guess as a final question (repeated from earlier): what did/does this Swift support actually do? Because right now I've only seen it mess up my setup. What are the benefits?
Is Ghidra then wrong for inferring that
int
is 8 bytes long? If this binary was indeed compiled with a compiler in whichint
is 8 bytes, why is that memory laid out in the binary according to the 4-byte-length layout? Was that potentially some compiler and/or linking magic?
I'm not knowledgeable about swift, but I am about golang and it may be a good analog.
By default, a RE'd golang binary typically won't benefit from type info imported from a C .h file that might include ints and uint32_t, etc.
However, that could change if the golang binary statically links in a C library. All of the sudden, in 1 binary you've got non-homogeneous definitions (at a source-code level) of what an int
is, and ghidra only allows you to specify a single compiler spec. for the entire binary.
re: option 1 or 2... or leave the struct alone and just change the problematic types it references (if they are all typedefs).
Thank you for the note. I'll keep this in mind. I think I'm gonna go with option 1 as it just brings me back to what I was doing before. Given that I think we have narrowed down the root cause and a mitigation, I'm going to close this issue. Thank you so much for your help!
@ryanmkurtz Just wanted to ping you here so you can look through the past conversation. It does, indeed, appear to be an issue with the Swift compiler inference. Importing as a default AARCH64 binary works fine, as it appears to just do what it did before.
Indeed, I made int 8 bytes for Swift programs. I wouldn't have expected that to make uint32 8 bytes as well though.
That does seem to be what it's doing. If that's unintentional, you may want to take a look at it. Anyway, I'm still keeping this closed as I have found my mitigation, but it seems that there might be work to be done in regards to Swift support (but I'll leave that to you). Let me know if you have any question for me and I can try and provide answers.
Yes indeed, i will take a look. Swift support adds a couple things. I'll paste this in from the "What's New" we released with Ghidra 11.1:
Initial support for binaries written in the Swift Programming Language has been added. The new support relies on the native Swift demangler being present on the user's system. Swift is automatically bundled with XCode on macOS, and can be optionally installed on Windows and Linux. See the "Demangler Swift" analyzer options for more information. Type information gathered from the demangled Swift symbol names is used to create corresponding Ghidra data types. This currently works for Swift primitives and structures, but more work needs to be done to include classes and other advanced data types. Swift-specific calling conventions are also applied to demangled Swift functions.
The primitive sizes and calling conventions are defined in the x86-64-swift.cspec
and AARCH64_swift.cspec
text files. At the top of these files is the <data organization>
section, where integer_size
is defined:
https://github.com/NationalSecurityAgency/ghidra/blob/1baf101d43379336d6a9dc0f6da803f946939a40/Ghidra/Processors/AARCH64/data/languages/AARCH64_swift.cspec#L3-L14
Why did I define integer
to be 8
bytes? For no other reason that an int
in Swift is 8 bytes. I admittedly did not know about this uinit32_t
problem though at the time.
What memory are you laying down structures on? Are they things generated by Swift, or some other Objective C or Mach-O thing? In those contexts, I could see an 8
byte integer as being undesirable.
You have the power to modify that .cspec
file and make integer
4
bytes. When you restart Ghidra, it should take effect. However, some changes to the Java code would need to take place so 8
byte long
s could be used to represent Swift.Int
:
https://github.com/NationalSecurityAgency/ghidra/blob/1baf101d43379336d6a9dc0f6da803f946939a40/Ghidra/Features/SwiftDemangler/src/main/java/ghidra/app/util/demangler/swift/nodes/SwiftStructureNode.java#L61-L79
This is really the first kind of feedback i've received on the Swift stuff (positive or negative) since its release, so I was expecting to have to adjust things as more test cases rolled in.
As for why Ghidra shows you 4
bytes on hover instead of 8
, that seems like a bug to me, personally.
What memory are you laying down structures on? Are they things generated by Swift, or some other Objective C or Mach-O thing? In those contexts, I could see an 8 byte integer as being undesirable.
I'm not 100% sure what generates it, but I know it's a data structure from an old legacy feature in macOS. Probably a C file (not even Objective-C) that's just linked in with the Swift files.
As for why Ghidra shows you 4 bytes on hover instead of 8, that seems like a bug to me, personally.
Sorry, to clarify, hover works fine. It's just that, in any data type dropdown (such as in the Struct Editor, or the code editor) it shows the three copies of uint32_t
: program archive, generic_clib_64
archive, mac_osx
archive; and only the program one shows the true size of 8 bytes (but if either of the other two are selected it still seems to choose the 8-byte one from the program archive).
Why did I define integer to be 8 bytes? For no other reason that an int in Swift is 8 bytes. I admittedly did not know about this uinit32_t problem though at the time.
I would agree that and Int is 8 bytes on 64-bit platforms, but Swift, to my knowledge, is still designed with 32 bit systems in mind as well:
On 32-bit platforms, Int is the same size as Int32, and on 64-bit platforms, Int is the same size as Int64. Source: https://developer.apple.com/documentation/swift/int
I'm not sure how that affects things.
Describe the bug This is probably just some flag or feature I am unaware of, but I am reversing a binary and trying to add a struct and I experience this behavior. This does not happen in most other binaries I've worked with.
To Reproduce Steps to reproduce the behavior:
uint32_t
Expected behavior The item is added with a size of 4 bytes.
Screenshots
Attachments N/A
Environment (please complete the following information):
Additional context I've tried playing around with alignment, but it doesn't seem to do anything. The struct in the screenshot has the alignment set to 4.