NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.42k stars 5.86k forks source link

String tidy for boundry alignment & nesting #1050

Open Hardcore-fs opened 5 years ago

Hardcore-fs commented 5 years ago

Currently strings are "null" terminated however there are two other cases

  1. Nested strings Where a string might have a base address, but within the same string are other strings. for example when printing tables. The first string may be the full width of the table, then the other nested strings, sub parts of the same string using the same "null" termination. Currently ghidra throws an error saying the sub-strings are already defined. so the base string has to be left as "bytes" thereby not getting a "string label" assigned

  2. aligned strings where the string has several "null" bytes to pad for the next word alignment. this leads to a string followed by a chain of "db" which have to be manually tidied up.

dev747368 commented 5 years ago

Regarding your first issue with strings and substrings, they are called off-cuts, and if ghidra sees a reference to the interior of a data type like a string, it will auto-generate a label for that portion of the string.

dev747368 commented 5 years ago

Also, there are fixed-length string data types in Ghidra (for example, string vs. TerminatedCString) If you hover over the mnemonic, the popup will tell you if the string data type is fixed length or null-term. The fixed-length types are constrained to the exact size you specify, with trailing nulls trimmed off. Interior nulls are preserved.

Hardcore-fs commented 5 years ago

Yes it generates a label but it leaves the string as separate bytes.. missed the part1, found part2, missed part 3, found part4. It's a big job to go thru & clean them all up

broken

Also I don't think the following labels are valid....or they look confusing when they are sorted.

invalid

dev747368 commented 5 years ago

yeah, not optimal auto-labels. In that case it should probably just fall back to something like STR_403bcdf8.

So, it seems like your biggest issue is that the analyzer or whatever that found the strings didn't see the whole string as one data element?

Hardcore-fs commented 5 years ago

The main issue is that you have to manually clean them up. if you select the un-alloacted area, and you convert to a string, you get a warning about it overlapping, but it does it correctly.... What i'm looking at has over 4,000 strings.. so it gets a bit tiresome...

fixed

dev747368 commented 5 years ago

If all the strings are the same size, you could just make an array of char_arrays. That would be pretty simple. But I doubt you have it that easy. Maybe time to dip your toes into GhidraScripting?

Hardcore-fs commented 5 years ago

TBH... much of this code under analysis is garbage..... it's all over the place, clearly a group effort some strings are * others are **, some are single characters, others are base additions off ASCII tables. It's beyond scripting... I was using "hopper disassembler", but it seems development has stopped, so I'm trialing ghidra. but the more I dig, the more anomalies I find..