dk / Prima

prima.eu.org
Other
106 stars 27 forks source link

t/Image/Transform.t aborts on memmove() in img_2d_transform() if compiled with -D_FORTIFY_SOURCE=3 and glibc-2.36.9000 #78

Closed ppisar closed 1 year ago

ppisar commented 1 year ago

When building Prima with -D_FORTIFY_SOURCE=3:

$ perl Makefile.PL OPTIMIZE='-Wp,-D_FORTIFY_SOURCE=3 -g -O2'

t/Image/Transform.t test aborts:

$ perl -Iblib/{lib,arch} t/Image/Transform.t ok 1 - rotate(90) width ok ok 2 - rotate(90) height ok ok 3 - rotate(90) data ok ok 4 - rotate(270) width ok ok 5 - rotate(270) height ok ok 6 - rotate(270) data ok ok 7 - rotate(180) width ok ok 8 - rotate(180) height ok ok 9 - rotate(180) data ok ok 10 - short: rotate(90) data ok ok 11 - short: rotate(270) data ok ok 12 - short: rotate(180) data ok ok 13 - byte: vertical mirroring ok ok 14 - byte: horizontal mirroring ok ok 15 - short: vertical mirroring ok ok 16 - short: horizontal mirroring ok ok 17 - rotation 360 seems performing ok 18 - rotation 360 is correct ok 19 - rotation 360 by transform2d seems performing ok 20 - rotation 360 transform2d is correct ok 21 - xshear(2) integral data ok 22 - xshear(2) integral mask buffer overflow detected : terminated Aborted (core dumped)

This can be reduced to:

$ cat /tmp/test.t 
#!/usr/bin/perl
use Prima::sys::Test qw(noX11);
my $p = Prima::Image->create(
        width    => 2,
        height   => 2,
        type     => im::Byte,
        data     => "\1\4..\3\6");
$p->shear(0,0.5);
$ perl -Iblib/{lib,arch} /tmp/test.t 
*** buffer overflow detected ***: terminated
Aborted (core dumped)

Valgrind reports:

$ valgrind -- perl -Iblib/{lib,arch} /tmp/test.t 
==50828== Memcheck, a memory error detector
==50828== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==50828== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==50828== Command: perl -Iblib/lib -Iblib/arch /tmp/test.t
==50828== 
==50828== Source and destination overlap in memcpy_chk(0x1ffeffd574, 0x1ffeffd580, 24)
==50828==    at 0x484C332: __memcpy_chk (vg_replace_strmem.c:1741)
==50828==    by 0x572FF3D: UnknownInlinedFun (string_fortified.h:36)
==50828==    by 0x572FF3D: img_2d_transform (rotate.c:1171)
==50828==    by 0x56DC6F1: Image_matrix_transform (conv.c:943)
==50828==    by 0x56D9BC4: Image_transform (conv.c:992)
==50828==    by 0x566FDA0: template_xs_Bool_Handle_HVPtr (thunks.tinc:3788)
==50828==    by 0x497BECF: Perl_pp_entersub (pp_hot.c:5352)
==50828==    by 0x496D84F: Perl_runops_standard (run.c:41)
==50828==    by 0x48DDFA0: UnknownInlinedFun (perl.c:2721)
==50828==    by 0x48DDFA0: perl_run (perl.c:2644)
==50828==    by 0x109349: main (perlmain.c:110)
==50828== 
**50828** *** memcpy_chk: buffer overflow detected ***: program terminated
==50828==    at 0x48470AC: VALGRIND_PRINTF_BACKTRACE (valgrind.h:6815)
==50828==    by 0x484C3BC: __memcpy_chk (vg_replace_strmem.c:1741)
==50828==    by 0x572FF3D: UnknownInlinedFun (string_fortified.h:36)
==50828==    by 0x572FF3D: img_2d_transform (rotate.c:1171)
==50828==    by 0x56DC6F1: Image_matrix_transform (conv.c:943)
==50828==    by 0x56D9BC4: Image_transform (conv.c:992)
==50828==    by 0x566FDA0: template_xs_Bool_Handle_HVPtr (thunks.tinc:3788)
==50828==    by 0x497BECF: Perl_pp_entersub (pp_hot.c:5352)
==50828==    by 0x496D84F: Perl_runops_standard (run.c:41)
==50828==    by 0x48DDFA0: UnknownInlinedFun (perl.c:2721)
==50828==    by 0x48DDFA0: perl_run (perl.c:2644)
==50828==    by 0x109349: main (perlmain.c:110)
==50828== 

It is this code:

    SKIP:
        if ( step < iop.n_steps - 1)
→           memmove( io, io + 1, (iop.n_steps - step - 1) * sizeof(ImgOp) );
        iop.n_steps--;
        step--;
        io--; 
    }

This happens both with Prima-1.67 as well as with current head (8a0c8bc3bf614f562f72fcd445d1146b66fc2292).

memove() is requires that source and target memory areas cannot overlap. Modern glibc scrutinizes the calls if a program is compiled with -D_FORTIFY_SOURCE=3 -O2 cflags and aborts the program when it detects overlapping memory areas.

Please note that I have not yet verified that this is a genuine bug in Prima and not a bug in glibc.

dk commented 1 year ago

Thank you for great anaysis! One thing bothers me though: for decades I knew that if one suspects that memory areas overlap, memmove is the safer approach than memcpy. Now you are saying the reverse. I shall check the bug later no matter what, but could you corroborate the statement about glibc.memmove's area that cannot overlap?

ppisar commented 1 year ago

You are right, memove() can overlap. I got confused by the valgrind output. Maybe glibc fortification does not play well with valgrind instrumentation, or valgrind prints a misnomer.

Here is a backtrace from GDB when Prima built with -D_FORTIFY_SOURCE=3 and glibc aborts the program:

Program received signal SIGABRT, Aborted.
0x00007ffff7ab02fc in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7ab02fc in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff7a5e056 in raise () at /lib64/libc.so.6
#2  0x00007ffff7a4787c in abort () at /lib64/libc.so.6
#3  0x00007ffff7a485b3 in _IO_peekc_locked.cold () at /lib64/libc.so.6
#4  0x00007ffff7b3fceb in  () at /lib64/libc.so.6
#5  0x00007ffff7b3e516 in  () at /lib64/libc.so.6
#6  0x00007ffff753bf3e in memmove
    (__len=<optimized out>, __src=0x7fffffffb5f0, __dest=0x7fffffffb5e4, __dest=<optimized out>, __src=<optimized out>, __len=<optimized out>) at /usr/include/bits/string_fortified.h:36
#7  img_2d_transform
    (self=93824992510816, matrix=<optimized out>, fill=0x7fffffffe100 "", output=0x7fffffffdab0)
    at img/rotate.c:1171
#8  0x00007ffff74e86f2 in Image_matrix_transform
    (self=93824992510816, matrix=0x7fffffffe0d0, fill=0x7fffffffe100 "") at class/Image/conv.c:943
#9  0x00007ffff74e5bc5 in Image_transform
    (self=self@entry=93824992510816, profile=profile@entry=0x555555596358) at class/Image/conv.c:992
#10 0x00007ffff747bda1 in template_xs_Bool_Handle_HVPtr
    (cv=<optimized out>, subName=<optimized out>, func=0x7ffff74e5a10 <Image_transform>)
    at include/generic/thunks.tinc:3788
#11 0x00007ffff7d1eed0 in Perl_pp_entersub () at /lib64/libperl.so.5.36
#12 0x00007ffff7d10850 in Perl_runops_standard () at /lib64/libperl.so.5.36
#13 0x00007ffff7c80fa1 in perl_run () at /lib64/libperl.so.5.36
#14 0x000055555555534a in main ()

It points to the same memove() call at img/rotate.c:1171. There are lot of local variables. Interesting ones from frame #7:

(gdb) p step
$1 = <optimized out>
(gdb) p iop
$2 = {n_steps = 2, steps = {{cmd = 2, p1 = 1, p2 = 1}, {cmd = 1, p1 = 0.5, p2 = 0}, {cmd = 1, 
      p1 = 0.5, p2 = 0}, {cmd = 0, p1 = 0, p2 = 0}, {cmd = 0, p1 = 0, p2 = 0}}}
(gdb) p sizeof(ImgOp)
$3 = 12

Frame #6 does not show anything interesting because all arguments were optimized out. Problem with _FORTIFY_SOURCE is that it requires enabled optimizations.

ppisar commented 1 year ago

If I recompile without fortification and optimizations, valgrind does not catch anything.

dk commented 1 year ago

This is all quite mystical I must say... if I compile with these flags, nothing is happening, neither coredumps nor valgrind alerts. Is it a specific glibc version and/or architecture? I tried this on Ubuntu GLIBC 2.35-0ubuntu3.1 .

One thing though, my compiler warns during the compilation:

cc -c  -Iinclude -Iinclude/generic -I/usr/include/fribidi -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/harfbuzz -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/gtk-3.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 -I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include -I/usr/include/gtk-3.0 -I/usr/include/gio-unix-2.0 -I/usr/include/cairo -I/usr/include/pango-1.0 -I/usr/include/pango-1.0 -I/usr/include/atk-1.0 -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/x86_64-linux-gnu -I/usr/include/libmount -I/usr/include/blkid  -fopenmp -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -Wp,-D_FORTIFY_SOURCE=3 -g -O3   -DVERSION=\"1.67\" -DXS_VERSION=\"1.67\" -fPIC "-I/home/dk/perl5/perlbrew/perls/perl-5.36.0/lib/5.36.0/x86_64-linux/CORE"   img/rotate.c -o img/rotate.o
<command-line>: warning: "_FORTIFY_SOURCE" redefined

I wonder if _FORTIFY_SOURCE=3 has any effect here?

I'd hate to call it a glibc bug but it looks like one. There are no direct google hits, but there is some vague info on google that memcpy and memmove were one until some version, and I see that in the stack trace you've got there is this line

at 0x484C332: __memcpy_chk (vg_replace_strmem.c:1741)

hinting that valgrind uses its own memcpy_chk to test, well, memcpy, and not necessarily memmove.

Anyway more details needed - I'd be curious too to recompile glibc version to dig down to the real problem.

dk commented 1 year ago

( btw I just tried to play with the latest glibc 5.36 but that didn't work for me, caused coredumps for every command...)

ppisar commented 1 year ago

I observe this on current Fedora 38. It has glibc-2.36.9000, a snapshot of development glibc newer than 2.36 release.

I added few debugging printf()s and this what it prints:

$ perl -Iblib/{lib,arch} /tmp/test 
sizeof(oip)=64
sizeof(oip.steps)=60
Initial iop.steps=0x7ffce4065984 past(iop.steps)=0x7ffce40659c0 iop.n_steps=3
step=0, iop.n_steps=3, io=0x7ffce4065984
Before memove condition: step=0 iop.n_steps=3
Calling memmove(io=0x7ffce4065984, io+1=0x7ffce4065990, (iop.n_steps - step - 1) * sizeof(ImgOp)=24)
memmove() finished
Decreasing iop.n_steps, step, io
step=0, iop.n_steps=2, io=0x7ffce4065984
Before memove condition: step=0 iop.n_steps=2
Calling memmove(io=0x7ffce4065984, io+1=0x7ffce4065990, (iop.n_steps - step - 1) * sizeof(ImgOp)=12)
*** buffer overflow detected ***: terminated
Aborted (core dumped)

If did the math correctly, than none of the two memmove() calls reads or writes out of the iop structure memory. It's ridiculous that first memmove() of size 24 passes fortification, but subsequent memmove() of size 12 fails the fortification check.

The corresponding initial iop content is:

(gdb) p iop
$5 = {n_steps = 3, steps = {{cmd = 0, p1 = 0, p2 = 0}, {cmd = 2, p1 = 1, p2 = 1}, {cmd = 1, p1 = 0.5,
      p2 = 0}, {cmd = 0, p1 = 0, p2 = 0}, {cmd = 0, p1 = 0, p2 = 0}}}

I conclude it's a bug in glibc or gcc. I will try to minimize the code and report to glibc developers.

dk commented 1 year ago

This is gets more and more interesting ... let's though keep this bug open for the reference

ppisar commented 1 year ago

I reported it to glibc together with a minimal reproducer https://bugzilla.redhat.com/show_bug.cgi?id=2160682.

siddhesh commented 1 year ago

You are right, memove() can overlap. I got confused by the valgrind output. Maybe glibc fortification does not play well with valgrind instrumentation, or valgrind prints a misnomer.

Yeah, valgrind can report spurious warnings with fortified functions: https://bugs.kde.org/show_bug.cgi?id=453084

I'm trying to figure out out to fix this in gcc, the io-- is throwing it off due to a potential underflow in the first iteration. It's not even actual undefined behaviour AFAICT, so I want to try and fix it in the compiler.

In the meantime, I've suggested a workaround in the fedora bug Petr filed for the reproducer, hopefully that helps work around this in actual code too. Basically the intent is to ensure that io never crosses object boundaries, even in unreachable code, regardless of whether it is actually dereferenced.

dk commented 1 year ago

I don't see the patch you mention but i"m ok to change the code. I cannot reproduce it though so i depend here on whoever can. Replace io + 1 to iop[steps + 1] or something similar, I would guess?

siddhesh commented 1 year ago

I've sent a PR that ought to resolve this.