Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 524 forks source link

lib/ExtUtils/t/Embed.t: failures on OpenBSD when compiling with gcc but not with clang #22125

Open jkeenan opened 3 weeks ago

jkeenan commented 3 weeks ago

For several months I have been noticing smoke-test failures under certain configurations on OpenBSD-7.4; see, e.g., report 5052514. The failures occur when configuring a perl built with these command-line switches:

$ sh ./Configure -des -Dusedevel -Dcc=gcc -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'

... and then, after running make test_prep, calling:

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd - 
ld: error: undefined symbol: Perl_more_sv
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
collect2: ld returned 1 exit status

not ok 1
not ok 10 # system returned -1
Failed 10/10 subtests 

Test Summary Report
-------------------
../lib/ExtUtils/t/Embed.t (Wstat: 0 Tests: 2 Failed: 2)
  Failed tests:  1, 10
  Parse errors: Tests out of sequence.  Found (10) but expected (2)
                Bad plan.  You planned 10 tests but ran 2.
Files=1, Tests=2,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.48 cusr  0.11 csys =  0.62 CPU)
Result: FAIL

I didn't pay these much attention at first because at first the perl was being built with a compiler I had never heard of, eg++. But more recently these failures were occuring with gcc as the compiler.

I haven't been able to install an OpenBSD more recent than 6.9, but I decided to explore this problem on that version today regardless. I got the test failure reported above on blead. I then decided to see whether these failures were due to a recent code change or whether @cjg-cguevara's exploration of these config options had merely revealed a problem that was "always" there. I tested at tag v5.38.0 and got the failure reported above.

$ uname -mrs
OpenBSD 6.9 amd64

$ gcc --version | head -1
gcc (GCC) 4.2.1 20070719

$ git show | head -3
commit 76298ae68aa7796f0ffc05095b127d23f4b2de8f
Author: Ricardo Signes <rjbs@semiotic.systems>
Date:   Sat Jul 1 20:48:27 2023 -0400

$ sh ./Configure -des -Dusedevel -Dcc=gcc -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'; make test_prep

$ ./perl -Ilib -V:config_args
config_args='-des -Dusedevel -Dcc=gcc -Accflags=-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS';

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd - 
ld: error: undefined symbol: Perl_more_sv
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//ccE9t92Z.o:(S_new_SV)
collect2: ld returned 1 exit status

not ok 1
not ok 10 # system returned -1
Failed 10/10 subtests 

Test Summary Report
-------------------
../lib/ExtUtils/t/Embed.t (Wstat: 0 Tests: 2 Failed: 2)
  Failed tests:  1, 10
  Parse errors: Tests out of sequence.  Found (10) but expected (2)
                Bad plan.  You planned 10 tests but ran 2.
Files=1, Tests=2,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.48 cusr  0.11 csys =  0.62 CPU)
Result: FAIL

Since the gcc version I was testing with dates back to 2007, I decided to try with the system's default C-compiler, clang-10.

$ clang --version                                                               >
OpenBSD clang version 10.0.1 
Target: amd64-unknown-openbsd6.9
Thread model: posix
InstalledDir: /usr/bin

$ sh ./Configure -des -Dusedevel -Dcc=clang -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'; make test_prep

With this more recent clang, the failing tests PASSed.

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd -

ok 1
ok 2
ok 3
ok 4
ok 5
ok 6
ok 7
ok 8
ok 9
ok 10 # system returned 0
ok
All tests successful.
Files=1, Tests=10,  1 wallclock secs ( 0.01 usr  0.02 sys +  0.55 cusr  0.36 csys =  0.94 CPU)
Result: PASS

It would be good if someone could test these config options on an up-to-date OpenBSD with both gcc and clang. We can then try to evaluate the source of the test failures. cc: @afresh1

afresh1 commented 3 weeks ago

I tried this on my -current sparc64 with both -Dcc=clang and -Dcc=gcc. I did not see this failure in either case. Let me know if you need me to try something else or need access to an OpenBSD machine for testing.

zarniwoop$ sysctl kern.version
kern.version=OpenBSD 7.5-current (GENERIC.MP) #2098: Wed Apr  3 15:08:08 MDT 2024
    deraadt@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/GENERIC.MP

zarniwoop$ gcc -v              
Reading specs from /usr/lib/gcc-lib/sparc64-unknown-openbsd7.5/4.2.1/specs
Target: sparc64-unknown-openbsd7.5
Configured with: OpenBSD/sparc64 system compiler
Thread model: posix
gcc version 4.2.1 20070719 
zarniwoop$ clang -v
OpenBSD clang version 16.0.6
Target: sparc64-unknown-openbsd7.5
Thread model: posix
InstalledDir: /usr/bin
jkeenan commented 3 weeks ago

I tried this on my -current sparc64 with both -Dcc=clang and -Dcc=gcc. I did not see this failure in either case.

And those were configured with -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' -- correct? [snip]

gcc version 4.2.1 20070719

Is there some reason why OpenBSD's port of gcc remains at this version more than 16 years old (other than not wanting to use the GPL)? Should we even be bothering to test perls built with gcc on OpenBSD?

jkeenan commented 3 weeks ago

Noting the 2 instances of ld: error: undefined symbol ...:

$ cd t && \
gcc -o embed_test -I..  -std=gnu99 -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DDEBUG_LEAKING_SCALARS -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c -L.. -lperl   -Wl,-E  -fstack-protector-strong -L/usr/local/lib  -lperl -lpthread -lm -lutil -lc; cd -

ld: error: undefined symbol: Perl_more_sv
>>> referenced by embed_test.c
>>>               /tmp//cc2IxT9I.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//cc2IxT9I.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//cc2IxT9I.o:(S_new_SV)
collect2: ld returned 1 exit status
[openbsd69: perl] $ ack -l '(Perl_more_sv)' .              
sv_inline.h
embed.h
proto.h
sv.c
[openbsd69: perl] $ ack -l '(PL_sv_serial)' .              
sv_inline.h
embedvar.h
makedef.pl
lib/Devel/PPPort.pm
...

Further analysis needs more C-foo than I have.

jkeenan commented 3 weeks ago

$ cd t && \
gcc -o embed_test -I..  -std=gnu99 -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DDEBUG_LEAKING_SCALARS -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c -L.. -lperl   -Wl,-E  -fstack-protector-strong -L/usr/local/lib  -lperl -lpthread -lm -lutil -lc; cd -

I should add that, as the above invocation suggests, I configured with each of the two options separately and learned that the test failure is being generated by -DDEBUG_LEAKING_SCALARS. When I configure only with -DPERL_RC_STACK and build with gcc, the test PASSes.

afresh1 commented 3 weeks ago

I tried this on my -current sparc64 with both -Dcc=clang and -Dcc=gcc. I did not see this failure in either case.

And those were configured with -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' -- correct? [snip]

Yes, I copied the incantation from the first message:

$ sh ./Configure -des -Dusedevel -Dcc=clang -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'

Just changing =clang to =gcc between them.

gcc version 4.2.1 20070719

Is there some reason why OpenBSD's port of gcc remains at this version more than 16 years old (other than not wanting to use the GPL)? Should we even be bothering to test perls built with gcc on OpenBSD?

I believe we have some architectures still that clang doesn't support and we are stuck with gcc, but where possible it is becoming the system compiler and when stable, we stop installing gcc.

And yes, it is because the clang licence is more acceptable to OpenBSD. I'm fairly sure we will not bring any GPL3 code into the base system at all.

iabyn commented 3 weeks ago

On Sat, Apr 06, 2024 at 08:45:25AM -0700, James E Keenan wrote:

$ cd t; ./perl harness -v ../lib/ExtUtils/t/Embed.t; cd - ld: error: undefined symbol: Perl_more_sv

referenced by embed_test.c /tmp//ccE9t92Z.o:(S_new_SV)

ld: error: undefined symbol: PL_sv_serial

referenced by embed_test.c /tmp//ccE9t92Z.o:(S_new_SV) referenced by embed_test.c /tmp//ccE9t92Z.o:(S_new_SV)

When perl is built with -DDEBUG_LEAKING_SCALARS, it changes a few things. In particular:

1) a static function called S_new_SV() is created that the macro new_SV() calls, rather than new_SV() doing the work directly itself.

2) The variable PL_sv_serial is defined.

In the Embed.t test, it tests perl's ability to embed perl in other code. So it creates a standalone C program, embed_test.c, which is compiled and linked to the perl library perl.o.

Something is going wrong at link time: the symbols PL_sv_serial and Perl_more_sv, apparently referenced from S_new_SV, can't be found by the look of it.

Perhaps it's something like perl getting compiled with DEBUG_LEAKING_SCALARS, while embed_test.c isn't?

S_new_SV() is weird: although it's static, it's not declared as inline, even although its defined in sv_inline.h. I don't know whether this is a mistake. It was in sv.c until 5.36.0. It was moved by this commit:

commit 75acd14e43f2ffb698fc7032498f31095b56adb5 Author: Richard Leach @.***> AuthorDate: Sun Feb 6 22:52:54 2022 +0000

Make newSV_type an inline function

When a new SV is created and upgraded to a type known at compile time,
uprooting a SV head and then using the general-purpose upgrade function
(sv_upgrade) is clunky. Specifically, while uprooting a SV head is
lightweight (assuming there are unused SVs), sv_upgrade is too big to be
inlined, contains many branches that can logically be resolved at compile
time for known start & end types, and the lookup of the correct
body_details struct may add CPU cycles.

This commit tries to address that by making newSV_type an inline function
and including only the parts of sv_upgrade needed to upgrade a SVt_NULL.
When the destination type is known at compile time, a decent compiler will
inline a call to newSV_type and use the type information to throw away all
the irrelevant parts of the sv_upgrade logic.

Because of the spread of type definitions across header files, it did not
seem possible to make the necessary changed inside sv.h, and so a new
header file (sv_inline.h) was created. For the inlined function to work
outside of sv.c, many definitions from that file were moved to sv_inline.h.

Finally, in order to also benefit from this change, existing code in sv.c
that does things like this:
    SV* sv;
    new_SV(sv);
    sv_upgrade(sv, SVt_PV)
has been modified to read something like:
    SV* sv = newSV_type(SVt_PV);

-- You're only as old as you look.

jkeenan commented 3 weeks ago

Bisecting on OpenBSD-6.9 with this invocation:

perl Porting/bisect.pl \
-Dcc=gcc \
-Accflags='-DDEBUG_LEAKING_SCALARS' \
--start=4f1687891150ddeda14ad7b0716032145bc69801 \
--end=8b03aeb95ab72abdb2fa40f2d1196ce42f34708d \
--target lib/ExtUtils/t/Embed.t

... confirmed the breaking commit:

commit 75acd14e43f2ffb698fc7032498f31095b56adb5
Author:     Richard Leach <rich+perl@hyphen-dash-hyphen.info>
AuthorDate: Sun Feb 6 22:52:54 2022 +0000
Commit:     Tomasz Konojacki <me@xenu.pl>
CommitDate: Mon Mar 7 01:08:53 2022 +0100

    Make newSV_type an inline function
jkeenan commented 3 weeks ago

@richardleach ^^

richardleach commented 3 weeks ago

Thanks, I'll try to look into this later this week

richardleach commented 1 week ago

S_new_SV() is weird: although it's static, it's not declared as inline, even although its defined in sv_inline.h. I don't know whether this is a mistake. It was in sv.c until 5.36.0. It was moved by this commit:

I'm assuming this is indeed an error by omission on my part, but haven't had chance to get an OpenBSD VM running yet.

tonycoz commented 1 week ago

I had a look at this and reproduced it locally. libperl.a looks fine:

$ nm libperl.a | grep PL_sv_serial
...
00006738 B PL_sv_serial
...

So what do we end up linking against? I edited Embed.t to add -Wl,-t to the compiler invocation:

# eg++ -o embed_test -Wl,-t -I..  -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c -L.. -lperl   -Wl,-E  -fstack-protector-strong -L/usr/local/lib  -lperl -lpthread -lm -lutil -lc
ld: error: undefined symbol: PL_sv_serial
>>> referenced by embed_test.c
>>>               /tmp//ccIahN7A.o:(S_new_SV)
>>> referenced by embed_test.c
>>>               /tmp//ccIahN7A.o:(S_new_SV)
collect2: error: ld returned 1 exit status
# /usr/lib/crt0.o

# /usr/lib/crtbegin.o

# /tmp//ccIahN7A.o

# /usr/lib/libperl.so.23.0

# /usr/lib/libperl.so.23.0

# /usr/lib/libpthread.so.27.0

# /usr/lib/libutil.so.16.0

# /usr/local/lib/libestdc++.so.20.0

# /usr/lib/libm.so.10.1

# /usr/lib/libc.so.97.0

# /usr/lib/libc.so.97.0

# /usr/lib/crtend.o

not ok 1
# embed_test = ./embed_test
not ok 10 # system returned -1
Failed 10/10 subtests 

So we're linking against the system libperl, which of course wasn't built with -DDEBUG_LEAKING_SCALARS and doesn't define PL_sv_serial.

So while 75acd14e43f2ffb698fc7032498f31095b56adb5 revealed this problem, it didn't cause the problem.

tonycoz commented 1 week ago

I've tried a few approaches to this

The only way I could get it to link the correct libperl was a direct reference:

# eg++ -o embed_test -Wl,-t -Wl,--verbose -I..  -DNO_LOCALE_NUMERIC -DNO_LOCALE_COLLATE -DNO_LOCALE_MONETARY -DNO_LOCALE_TIME -DNO_LOCALE_MESSAGES -DLIBC_HANDLES_MISMATCHED_CTYPE -DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS -fwrapv -fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include   embed_test.c ../libperl.a   -Wl,-E  -fstack-protector-strong   -lpthread -lm -lutil -lc
# /usr/lib/crt0.o

# /usr/lib/crtbegin.o

# /tmp//cc13wQEa.o

# ../libperl.a(perl.o)

# ../libperl.a(op.o)

# ../libperl.a(universal.o)

# ../libperl.a(av.o)

...

I don't see a way to get ld to trace where it looks for libraries, so I don't see a way to trace this any further.

richardleach commented 3 days ago

@tonycoz - Thanks for figuring out what was going on here re: linking.

tonycoz commented 3 days ago

I need to look into it further, perl itself seems to link fine.

tonycoz commented 2 days ago

I need to look into it further, perl itself seems to link fine.

The perl executable links the library via direct reference:

eg++ -o perl -Wl,-E  -fstack-protector-strong -L/usr/local/lib  perlmain.o   libperl.a `cat ext.libs` -lpthread -lm -lutil -lc

So I see two issues directly related to the test here:

I can see a couple of issues not directly related to the failures: