Open vpachkov opened 3 years ago
Also please take a look at #47786 PR. It contains a possible implementation of mapping symbols functionality.
cc @cherrymui @thanm since they requested OP to open an issue on the CL.
Change https://golang.org/cl/343150 mentions this issue: cmd: support mapping symbols for ARM64
What are the benefits exactly? It seems the only difference is it makes objdump output nicer? And only for that three NOPs?
Also I think it's reasonable to use NOPs for function aligning instead of zeroing
I think that is fine (and can be done independently). Or maybe we should use a trap instruction.
What are the benefits exactly? It seems the only difference is it makes objdump output nicer? And only for that three NOPs?
Also I think it's reasonable to use NOPs for function aligning instead of zeroing
I think that is fine (and can be done independently). Or maybe we should use a trap instruction.
The reason is - it lowers the amount of generated mapping symbols inside a symbol table. "$d" symbol should be created for every transition from code (actual instructions) to data (something that's not an actual instruction e.g. padding zeros at the bottom of a function). If we used NOPs for padding, additional "$d" wouldn't be required since NOP is a correct instruction.
What are the benefits for those symbols at the first place? Why does it matter if it is instruction or data?
The rationale from the ARM document says "Linkers, file decoders and other tools need to map binaries correctly", for what that is worth.
It would be interesting to see what other tools out there besides objdump actually make use of the symbols. I thought maybe they might be used in something like dynamorio or BOLT, but I can't seem to find any code there that uses them.
Hello @thanm @cherrymui . llvm-bolt project indeed uses mapping symbols, that's why we need this patch. For example during the function disassemble stage we need to check if it is the constant island on the particular function offset, otherwise we will try to disassemble it as the instruction. JFYI The data offsets for functions are filled here
Thanks @yota9, I stand corrected. My search wasn't very thorough apparently.
llvm-bolt project indeed uses mapping symbols, that's why we need this patch
Could you explain more? From "[it] uses mapping symbols" to "we need this patch" there are many steps in between. What happens if we don't have them?
try to disassemble it as the instruction
What is the problem for this? (FWIW, currently, we don't support and expect any tool post-editing a Go binary.)
What is the problem for this? (FWIW, currently, we don't support and expect any tool post-editing a Go binary.)
If it is not the instruction it will fail to disassemble it. Since the data in constant island is the part of the function we need to know exactly where are the instructions and where are the data to process it correctly.
As for the second part I'm working on golang support for llvm-bolt tool. I hope it will be open sourced soon.
See https://github.com/golang/go/issues/49031#issuecomment-945905417 about binary post-editing. Is there any other reason we want to do this? Thanks.
My opinion is that the main reason why we want to do this is it's a part of the ARM64 ELF standard. Optimizers, linkers, debuggers, profiling and disassembling tools need to map images correctly and they rely on that standard. So, answering your question, binary post-editing isn't the only reason for doing this. For example, setting a breakpoint at the literal pool location, can crash the debugging process since without mapping symbols a debugger tool is going to treat that area as instructions.
Related: elderly issue #9118.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
What did you expect to see?
The special mapping symbols appear in the symbol table. Readlef:
Objdump:
What did you see instead?
The lack of $x and $d arm mapping symbols inside the symbol table and a regular zeroed padding Objfump:
ELF for the Arm® 64-bit Architecture (AArch64): Mapping symbols chapter requires that the special symbols are inserted into object files: $x - At the start of a region of code containing AArch64 instructions. $d - At the start of a region of data.
I propose to add this functionality since it's a part of a standard and already supported by other languages.
Also I think it's reasonable to use NOPs for function aligning instead of zeroing. There was no purpose of doing it before, but now this's needed to not generate $x and $d for every function and place them just in transitions. In other words, this is an optimization that minimizes the amount of mapping symbols inside the symbol table.