golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.83k stars 17.65k forks source link

cmd: support mapping symbols for ARM64 #47908

Open vpachkov opened 3 years ago

vpachkov commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version devel go1.18-8b471db71b Wed Aug 18 08:26:44 2021 +0000 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="off"
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/slava/Library/Caches/go-build"
GOENV="/Users/slava/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/slava/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/Users/slava/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/slava/dev/mygo/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/slava/dev/mygo/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="devel go1.18-8b471db71b Wed Aug 18 08:26:44 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="/usr/bin/clang"
CXX="/usr/bin/clang++"
CGO_ENABLED="0"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ld/pkx3km8x53qc7qb6wwwvxcr80000gn/T/go-build3739783046=/tmp/go-build -gno-record-gcc-switches"

What did you do?

$ go build

What did you expect to see?

The special mapping symbols appear in the symbol table. Readlef:

67: 0000000000018dc4     0 NOTYPE  LOCAL  DEFAULT    1 $d
68: 0000000000018df0     0 NOTYPE  LOCAL  DEFAULT    1 $x

Objdump:

   18db4:   f94007e0    ldr x0, [sp, #8]
   18db8:   f9400be1    ldr x1, [sp, #16]
   18dbc:   17fffe99    b   18820 
   18dc0:   14000000    b   18dc0 
   18dc4:   00010198    .inst   0x00010198 ; undefined
   18dc8:   000101f8    .inst   0x000101f8 ; undefined
   18dcc:   000101f0    .inst   0x000101f0 ; undefined
   18dd0:   000101e8    .inst   0x000101e8 ; undefined
   18dd4:   000101c0    .inst   0x000101c0 ; undefined
   18dd8:   000169e8    .inst   0x000169e8 ; undefined
   18ddc:   000169b8    .inst   0x000169b8 ; undefined
   18de0:   000169d0    .inst   0x000169d0 ; undefined
   18de4:   d503201f    nop
   18de8:   d503201f    nop
   18dec:   d503201f    nop

0000000000018df0 runtime.sysReserveAligned:
   18df0:   f9400b90    ldr x16, [x28, #16]
   18df4:   910003f1    mov x17, sp
   18df8:   eb10023f    cmp x17, x16

What did you see instead?

The lack of $x and $d arm mapping symbols inside the symbol table and a regular zeroed padding Objfump:

   18db4:   f94007e0    ldr x0, [sp, #8]
   18db8:   f9400be1    ldr x1, [sp, #16]
   18dbc:   17fffe99    b   18820 
   18dc0:   14000000    b   18dc0 
   18dc4:   00010198    .inst   0x00010198 ; undefined
   18dc8:   000101f8    .inst   0x000101f8 ; undefined
   18dcc:   000101f0    .inst   0x000101f0 ; undefined
   18dd0:   000101e8    .inst   0x000101e8 ; undefined
   18dd4:   000101c0    .inst   0x000101c0 ; undefined
   18dd8:   000169e8    .inst   0x000169e8 ; undefined
   18ddc:   000169b8    .inst   0x000169b8 ; undefined
   18de0:   000169d0    .inst   0x000169d0 ; undefined
    ...

0000000000018df0 runtime.sysReserveAligned:
   18df0:   f9400b90    ldr x16, [x28, #16]
   18df4:   910003f1    mov x17, sp
   18df8:   eb10023f    cmp x17, x16

ELF for the Arm® 64-bit Architecture (AArch64): Mapping symbols chapter requires that the special symbols are inserted into object files: $x - At the start of a region of code containing AArch64 instructions. $d - At the start of a region of data.

I propose to add this functionality since it's a part of a standard and already supported by other languages.

Also I think it's reasonable to use NOPs for function aligning instead of zeroing. There was no purpose of doing it before, but now this's needed to not generate $x and $d for every function and place them just in transitions. In other words, this is an optimization that minimizes the amount of mapping symbols inside the symbol table.

vpachkov commented 3 years ago

Also please take a look at #47786 PR. It contains a possible implementation of mapping symbols functionality.

ALTree commented 3 years ago

cc @cherrymui @thanm since they requested OP to open an issue on the CL.

gopherbot commented 3 years ago

Change https://golang.org/cl/343150 mentions this issue: cmd: support mapping symbols for ARM64

cherrymui commented 3 years ago

What are the benefits exactly? It seems the only difference is it makes objdump output nicer? And only for that three NOPs?

Also I think it's reasonable to use NOPs for function aligning instead of zeroing

I think that is fine (and can be done independently). Or maybe we should use a trap instruction.

vpachkov commented 3 years ago

What are the benefits exactly? It seems the only difference is it makes objdump output nicer? And only for that three NOPs?

Also I think it's reasonable to use NOPs for function aligning instead of zeroing

I think that is fine (and can be done independently). Or maybe we should use a trap instruction.

The reason is - it lowers the amount of generated mapping symbols inside a symbol table. "$d" symbol should be created for every transition from code (actual instructions) to data (something that's not an actual instruction e.g. padding zeros at the bottom of a function). If we used NOPs for padding, additional "$d" wouldn't be required since NOP is a correct instruction.

cherrymui commented 3 years ago

What are the benefits for those symbols at the first place? Why does it matter if it is instruction or data?

thanm commented 3 years ago

The rationale from the ARM document says "Linkers, file decoders and other tools need to map binaries correctly", for what that is worth.

It would be interesting to see what other tools out there besides objdump actually make use of the symbols. I thought maybe they might be used in something like dynamorio or BOLT, but I can't seem to find any code there that uses them.

yota9 commented 3 years ago

Hello @thanm @cherrymui . llvm-bolt project indeed uses mapping symbols, that's why we need this patch. For example during the function disassemble stage we need to check if it is the constant island on the particular function offset, otherwise we will try to disassemble it as the instruction. JFYI The data offsets for functions are filled here

thanm commented 3 years ago

Thanks @yota9, I stand corrected. My search wasn't very thorough apparently.

cherrymui commented 3 years ago

llvm-bolt project indeed uses mapping symbols, that's why we need this patch

Could you explain more? From "[it] uses mapping symbols" to "we need this patch" there are many steps in between. What happens if we don't have them?

try to disassemble it as the instruction

What is the problem for this? (FWIW, currently, we don't support and expect any tool post-editing a Go binary.)

yota9 commented 3 years ago

What is the problem for this? (FWIW, currently, we don't support and expect any tool post-editing a Go binary.)

If it is not the instruction it will fail to disassemble it. Since the data in constant island is the part of the function we need to know exactly where are the instructions and where are the data to process it correctly.

As for the second part I'm working on golang support for llvm-bolt tool. I hope it will be open sourced soon.

cherrymui commented 3 years ago

See https://github.com/golang/go/issues/49031#issuecomment-945905417 about binary post-editing. Is there any other reason we want to do this? Thanks.

vpachkov commented 3 years ago

My opinion is that the main reason why we want to do this is it's a part of the ARM64 ELF standard. Optimizers, linkers, debuggers, profiling and disassembling tools need to map images correctly and they rely on that standard. So, answering your question, binary post-editing isn't the only reason for doing this. For example, setting a breakpoint at the literal pool location, can crash the debugging process since without mapping symbols a debugger tool is going to treat that area as instructions.

thanm commented 2 years ago

Related: elderly issue #9118.