Version 0.5 causes switch/case symbols to show up

roblabla commented 3 months ago

Hey, thanks for the great extension!

I'm currently working on using this along with objdiff in my decompilation project th06 to enable a better diffing workflow.

I have something working great with delinker version 0.4. However, in version 0.5, the delinker now contains a bunch of extra symbols that add a ton of noise, and don't seem to be of great value. In particular

Delinker 0.4:

Delinker 0.5:

Those symbols seem to be inserted automatically by ghidra during switch/case analysis.

Would it make sense perhaps to detect them and not include them in the symbol table of the generated object file?

boricj commented 3 months ago

Hi,

Given all the bugs I've fixed over the past three months (see #6 for a partial history), the COFF exporter wasn't fit for purpose until version 0.5.0 ; I would not trust any results coming from version 0.4.0.

This is not a use-case I have foreseen and I'm not sure it can be fully accommodated. This extension was developed with the idea that it's not only possible to delink executables, but that the result could also be interchangeable with the original object file as far as the linker is concerned. This means that the contents of the exported object file can differ, because there are multiple equivalent ways to encode its information.

For example, comparing ascii-table.o from the ascii-table Linux MIPS ELF test case against its exported equivalent with objdiff shows several discrepancies across multiple functions. This is because the compiler and the exporter chose different ways to refer to the same offsets within a section, in a manner that resulted in different values for the addends of the relocations ; specifically, the compiler used unnamed section-relative relocations whereas the exporter used traditional named symbols.

Link-time optimizations and relaxations are another source of possible discrepancies. One particularly tricky example on MIPS hoists instructions inside branch delay slots, requiring some migraine-inducing workarounds if those instructions happened to have HI16 relocations. I haven't seen anything remotely as drastic on x86, but I'm vaguely aware of peephole optimizations that can eliminate some indirections through the PLT or GOT at link time by rewriting instructions and relocation types.

Going back to your initial issue, if the Give dynamic symbols external visibility option was left unchecked in the exporter these dynamic symbols should be local and not global. Alternatively, the exporter could be modified to not emit local symbols and generate section-relative relocations instead, something that I'm willing to add as an option. That might be enough for your particular use-case, but I'm doubtful that perfect delinking in general can be achieved with my extension, given its policy of not using any manual, delinker-specific annotations.

roblabla commented 3 months ago

This is because the compiler and the exporter chose different ways to refer to the same offsets within a section, in a manner that resulted in different values for the addends of the relocations ; specifically, the compiler used unnamed section-relative relocations whereas the exporter used traditional named symbols.

This is fine. Objdiff can be configured to ignore relocation differences. This allows it to paper over slight differences in relocation encoding that ultimately result in the same value (at the expense of potentially not seeing actual differences "hidden" by a relocation, but that's a tradeoff we can afford to take). And anyways the diffing doesn't need to be "perfect" - we're using it as a tool to guide our re-implementation towards having the correct assembly. We're currently using diff on assembly dumps, which is much noiser due to not having the exact same memory layout (something we'll work on at a later phase of the decompilation).

Going back to your initial issue, if the Give dynamic symbols external visibility option was left unchecked in the exporter these dynamic symbols should be local and not global.

Unfortunately, I tried both with and without "Give dynamic symbols external visibility", and it does not get rid of the "switchD" symbols. When it's enabled, I additionally get symbols for LAB_ symbols, so the checkmark does do something, but it doesn't get rid of those symbols. Those are probably not considered dynamic - they actually show up in the "Symbol Tree" after all, unlike LAB_ labels.

Alternatively, the exporter could be modified to not emit local symbols and generate section-relative relocations instead, something that I'm willing to add as an option.

That's be great :eyes:

roblabla commented 3 months ago

On a similar note, labels that were renamed to give a more useful name also show up as symbols in the delinker. I feel like those should definitely not show up - just as goto labels don't show up as symbols in a normal COFF file. They also did not show up in 0.4.

I assume this is the same root problem.

boricj commented 3 months ago

Unfortunately, I tried both with and without "Give dynamic symbols external visibility", and it does not get rid of the "switchD" symbols. When it's enabled, I additionally get symbols for LAB_ symbols, so the checkmark does do something, but it doesn't get rid of those symbols. Those are probably not considered dynamic - they actually show up in the "Symbol Tree" after all, unlike LAB_ labels.

This checkbox gives dynamic symbols external visibility instead of local. I assumed that the switchD symbols were dynamic and that the checkbox was accidently checked, but I was mistaken.

I've looked into Ghidra's symbol handling and dynamic symbols are defined as not having a DBRecord. I'm not entirely sure how they work, but creating a symbol at their address will supplant them. This includes not only manually created ones (like bulletLoopContinue as shown above), but also symbols created through analysis. Those switchD labels for example are created by the decompiler switch analyzer and are therefore not dynamic, because they exist within the program database.

I didn't really needed to deal with that problem before and it turns out Ghidra doesn't have a concept of symbol visibility I can piggyback. I think the best we can do here is to add an option where symbol names matching an user-supplied regex pattern will be considered local, on top of checking for dynamic symbols.

Assuming symbol visibility is a solved problem, next step is the ability to generate section-relative relocations as stated above, so that local symbols can be trimmed from the exported object file's symbol table. That will require some refactorings first, mostly because currently relocations unapply themselves based on a predetermined named symbol and addend, so there's no mechanism to handle unnamed section-relative relocations with a different addend.

I feel like those should definitely not show up - just as goto labels don't show up as symbols in a normal COFF file. They also did not show up in 0.4.

That's probably because version 0.4.0 trimmed local symbols that were not used by any relocations. Combined with the trimming of PC-relative relocations whose source and destination addresses lie within the same contiguous selected address range, that might explain why they don't show up in that version (aside from the bugginess of the COFF exporter back then).

The local symbol trimming was lost when I recently refactored and cleaned up the generic parts of symbol handling out of the exporters, so that symbol name mangling could be properly handled with the symbol name preference feature. The way the local symbol trimming was done was rather hacky and it wasn't that useful in practice. I'd rather not reintroduce it, which is why I'm preferring section-relative relocations to accommodate your use-case.

TL;DR Implementing regex-based symbol visibility and section-relative relocations should eliminate the unwanted symbols from your exported object files. The first part should be straightforward to implement, the second part will require some refactorings to pull off.

roblabla commented 3 months ago

Rather than a regex-based solution, could we perhaps have an option to only export symbols in the executable section if they have a matching Function? This should get rid of the goto labels/switchcase, while leaving the symbols for all the functions and the data.

It's obviously not a perfect heuristic (some programs might store data in the .text section/segment), but I feel like it should work decently enough for our use-cases.

boricj commented 3 months ago

Hmm... We can check if a function contains the address of a symbol. If so, then the symbol can be made local unless it's at the entrypoint. That should get rid of all the labels within functions and I don't see a downside that would cause problems later on.

As for the switch data labels, the problem remains. I can't use your suggested heuristic because of the sheer scope of this extension. It supports two object file formats, two ISAs and to my knowledge was tested with artifacts from three different platforms (Linux, Windows, PlayStation) spanning 25 years. That's a very delicate balancing act to pull off and it will only get worse as time goes on. From experience, taking shortcuts inside this code base never ends well, it's unusually exacting.

That's why I'm looking for solutions which not only work for one use-case, but are correct for all use-cases. Unless there's a better proposal, an option for a regex that matches local symbols (by default set to switchD_.+::switchdataD_.+) is the least bad idea I can think of for symbols that aren't inside a function.

boricj commented 3 months ago

I've pushed some commits on master to add options that should give these symbols static visibility. Can you check if this catches all the symbols you want gone? Next step will be to make their relocations section-relative and trim them from the symbol table, but it'll require some refactorings first to make it happen.

roblabla commented 3 months ago

Yes, it worked perfectly! Only the more "relevant" symbols show up now.

boricj / ghidra-delinker-extension

Version 0.5 causes switch/case symbols to show up #8