Closed avncharlie closed 1 year ago
Hi @avncharlie, thanks for the report.
Unfortunately, I was unable to reproduce the issue. I tested with the latest ddisasm unstable build (c8c7996), as well as the latest stable (1.5.7).
I got the ls
binary to test from the ubuntu:22.04
Docker image on Docker Hub:
$ sha256sum ./ls.ubuntu2204
1e39354a6e481dac48375bfebb126fd96aed4e23bab3c53ed6ecf1c5e4d5736d ./ls.ubuntu2204
As a note, the ddisasm version 1.6.0
(and the corresponding image on Docker Hub) is still an unstable version, and has new builds pushed to it periodically. Because of this, I am not sure the ddisasm version I'm testing is the same build you're using. (We're working on changing our process, so that in the future, only stable builds will be published to specific version number tags, and the unstable build will be available as an unstable
tag.) The latest stable build is 1.5.7.
If you're still able to reproduce this on the latest unstable, or on 1.5.7, can you attach your ls
binary to the issue? It may have some subtle difference from what I am testing.
Thanks for the reply!
It looks like the sha256sum you provided is actually from the Ubuntu 20.04 ls
binary?
This is the sha256sum
output I got from the ls
binary on both my 22.04 system and the ubuntu:22.04
Docker image:
$ sha256sum ./ls.ubuntu2204
8696974df4fc39af88ee23e307139afc533064f976da82172de823c3ad66f444 ./ls.ubuntu2204
And this is the output I got from my Ubuntu 20.04 installation:
$ sha256sum /usr/bin/ls
1e39354a6e481dac48375bfebb126fd96aed4e23bab3c53ed6ecf1c5e4d5736d /usr/bin/ls
I tried using ddisasm version 1.5.7 but this still produced a segfault in the rewritten binary.
Here is the ls
binary I am working with.
You're right, somehow I mixed it up and used the Ubuntu 20.04 ls
. Thank you for catching that.
Using the correct binary, I can now replicate the problem.
What is happening is that this binary - for some reason - has .ctors
and .dtors
sections, rather than .init_array
and .fini_array
. This is an older convention, and .init_array
and .fini_array
are considered more modern alternatives; so much so, that some linkers rewrite .ctors
and .dtors
sections as .init_array
and .fini_array
. This rewriting by the linker results in the crash, although I am not yet precisely sure why.
The assembly generated by ddisasm/gtirb-pprinter is correct (it retains the .ctors
and .dtors
sections of the original binary), the problem is how they are handled when re-linking. I think the only solution will be to tweak the options supplied to the linker.
One such workaround is to use the gold
linker, which allows us to disable this conversion:
gcc -o ls ls.s -nostartfiles -lselinux -Wl,--no-ctors-in-init-array -fuse-ld=gold
On my end, this workaround results in a functional binary.
I think we should also consider making this behavior automatic if using the gtirb-pprinter --binary
option if .ctor
and .dtor
sections are detected. That would not really help you here, since you're invoking gcc
directly.
If gtirb-pprinter is producing correct disassembly, could this be a bug in the gcc linker? i.e the .ctors / .dtors overwriting functionality isn't working correctly.
I found this discussion on adding the feature of rewriting .ctors and .dtors sections in the gcc linker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770 if it helps.
Using the gold linker produced a working binary, so marking this issue as closed. Thanks
I did a bit more testing, and you can also work around the problem with ld
by specifying a custom linker script, although it's a bit more involved than gold
's option. When building with gcc, you can obtain the linker script in use by building with -Wl,--verbose
. Copy that to a file and remove the .init_array
and .fini_array
sections:
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))
KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors))
PROVIDE_HIDDEN (__init_array_end = .);
}
.fini_array :
{
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*)))
KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
PROVIDE_HIDDEN (__fini_array_end = .);
}
Notice how these capture the .ctors
and .dtors
sections with e.g., SORT_BY_INIT_PRIORITY(.ctors.*)))
.
Then, you can build specifying your new linker script:
gcc -o ls ls.s -nostartfiles -lselinux -T ./my_script.ld
I found this issue on the gtirb-pprinter repository which seems to be about the same ctors / dtors problem: https://github.com/GrammaTech/gtirb-pprinter/issues/3#issuecomment-757014697 The solution here was to skip the ctors and dtors sections while pretty printing. This produced assembly I could assemble with gcc with no extra options.
$ gtirb-pprinter /workspace/out.gtirb --asm /workspace/out.s --skip-section .ctors --skip-section .dtors
$ gcc out.s -o out -nostartfiles -lselinux
The solution here was to skip the ctors and dtors sections while pretty printing.
Of course, it is worth mentioning for future readers that doing it this way results in the rewritten binary having .init_array
and .fini_array
sections instead of .ctors
and .dtors
. That may be fine, depending on the application.
However, I do think there may some danger in it. There is a reference to .L_21000
, which is in the .ctors
section:
#-----------------------------------
.type FUN_170f0, @function
#-----------------------------------
FUN_170f0:
# ...
movq .L_21000(%rip),%rax
cmpq $-1,%rax
je .L_17130
pushq %rbp
movq %rsp,%rbp
pushq %rbx
leaq .L_21000(%rip),%rbx
subq $8,%rsp
# ...
.L_17118:
callq *%rax
movq -8(%rbx),%rax
subq $8,%rbx
cmpq $-1,%rax
jne .L_17118
movq -8(%rbp),%rbx
leave
retq
.byte 0x66
.byte 0x90
.L_17130:
retq
This function is called in the original binary, specified by a DT_INIT
tag in the .dynamic
section. This code is what results in execution of the .ctors
section.
If printed with --skip-section=.ctors
, these references are printed as zero:
movq 0(%rip),%rax # WARNING:0: no symbol for address 0x21000
This code does not seem to be executed in the rewritten binary. Something in the process of rewriting and re-linking loses the DT_INIT
tag that specifies it should be executed. Thus, what you are doing works, but it might be more safe if you add --skip-section=.init
and --skip-function=FUN_170f0
to ensure this code is elided.
I am unable to reassemble "ls" using ddisasm and gtirb-pprinter on Ubuntu 22. I am using the grammatech/ddisasm docker image. ddisasm is version 1.6.0 and gtirb-pprinter is version 1.9.0.
When I try to (re)assemble the assembly generated by ddisasm for the "ls" utility, the rewritten binary segfaults.
Commands to reproduce: Generate GTIRB file
$ docker run --rm -v $(pwd):/workspace grammatech/ddisasm sh -c "ddisasm /workspace/ls --ir /workspace/out.gtirb"
Generate assembly
$ docker run --rm -v $(pwd):/workspace grammatech/ddisasm sh -c "gtirb-pprinter /workspace/out.gtirb --asm /workspace/out.s"
Assemble
$ gcc -nostartfiles out.s -o out -lselinux
Resulting binary will segfault
Looking at the stack trace from the core dump in GDB shows:
So the rewritten binary is crashing before main is reached.