Open Hardcore-fs opened 5 years ago
Regarding your first issue with strings and substrings, they are called off-cuts, and if ghidra sees a reference to the interior of a data type like a string, it will auto-generate a label for that portion of the string.
Also, there are fixed-length string data types in Ghidra (for example, string vs. TerminatedCString) If you hover over the mnemonic, the popup will tell you if the string data type is fixed length or null-term. The fixed-length types are constrained to the exact size you specify, with trailing nulls trimmed off. Interior nulls are preserved.
Yes it generates a label but it leaves the string as separate bytes.. missed the part1, found part2, missed part 3, found part4. It's a big job to go thru & clean them all up
Also I don't think the following labels are valid....or they look confusing when they are sorted.
yeah, not optimal auto-labels. In that case it should probably just fall back to something like STR_403bcdf8.
So, it seems like your biggest issue is that the analyzer or whatever that found the strings didn't see the whole string as one data element?
The main issue is that you have to manually clean them up. if you select the un-alloacted area, and you convert to a string, you get a warning about it overlapping, but it does it correctly.... What i'm looking at has over 4,000 strings.. so it gets a bit tiresome...
If all the strings are the same size, you could just make an array of char_arrays. That would be pretty simple. But I doubt you have it that easy. Maybe time to dip your toes into GhidraScripting?
TBH... much of this code under analysis is garbage..... it's all over the place, clearly a group effort some strings are * others are **, some are single characters, others are base additions off ASCII tables. It's beyond scripting... I was using "hopper disassembler", but it seems development has stopped, so I'm trialing ghidra. but the more I dig, the more anomalies I find..
Currently strings are "null" terminated however there are two other cases
Nested strings Where a string might have a base address, but within the same string are other strings. for example when printing tables. The first string may be the full width of the table, then the other nested strings, sub parts of the same string using the same "null" termination. Currently ghidra throws an error saying the sub-strings are already defined. so the base string has to be left as "bytes" thereby not getting a "string label" assigned
aligned strings where the string has several "null" bytes to pad for the next word alignment. this leads to a string followed by a chain of "db" which have to be manually tidied up.