Open gsteemso opened 3 months ago
Could you please attach a build log and the generated config.sh
?
Please see these attachments. There should be 5. I included config.sh (I had to rename it for Github) and the stdout and stderr captures for each of Configure
and make
.
config.sh.txt
Configure.stdout.log
Configure.stderr.log
make.stdout.log
make.stderr.log
I should add that something went a bit odd a couple of days ago, such that a lot of GCC's usual stderr output has stopped appearing. I'm still trying to find anything that changed.
Please see these attachments. There should be 5. I included config.sh (I had to rename it for Github) and the stdout and stderr captures for each of
Configure
andmake
. config.sh.txt Configure.stdout.log Configure.stderr.log make.stdout.log make.stderr.log
I have no particular expertise in this area. However, it occurs to me that since you are getting a segfault as early in the process as ./Configure
, you could begin by getting a tarball of perl-5.38, configuring with the same arguments as previously, and seeing whether ./Configure
completes successfully and segfault-free. That would open up the possibility of bisection.
I believe the segfault during ./Configure is probably an expected failure resulting from an unsuccessful test, because it does not seem to bother it any. The build process continues unimpeded until the big halt in what ought to be the middle of that 'make' run.
I can already tell you that 5.38.x builds successfully with the same parameters, as do all earlier versions that I tried. That 5.40.0 et seq do not build successfully when all others did before is the entire problem here.
I can already tell you that 5.38.x builds successfully with the same parameters, as do all earlier versions that I tried. That 5.40.0 et seq do not build successfully when all others did before is the entire problem here.
So in principle this is bisectable, with (roughly) these steps:
perldoc perlgit
to clone the repository and get a local checkout.git checkout v5.38.0; sh ./Configure -des -Dusedevel [your other config options or a simplified subset thereof] && make
. That should complete successfully. git clean -dfxq; git checkout v5.40.0; [as above]
; that should fail. Note: If you can reproduce the build failure at tag 5.40.0
with a smaller list of configuration options (e.g., just -Dusenm -Dusethreads -Duse64bitall -Accflags=-DNO_MATHOMS
), that would greatly simplify our analysis.perldoc Porting/bisect-runner.pl
. The bisection program itself will be run something like this:
$ perl Porting/bisect.pl \
-D[config options] \
--start=v5.38.0 \
--end=v5.40.0 \
--test-build
bisect.pl
will run in the checkout. Assuming it gets past the end and start revisions, it will start to log its process in the ./.git
subdirectory beneath the checkout. You can follow that progress in a separate terminal with something like cat BISECT_RUN; tail BISECT_LOG
. HTH!
Please try adding -Duse64bitint
to the Configure
command-line to ensure both 32-bit and 64-bit builds are using the same sized UV and IV types, I suspect they're different here causing the static assertion to fail.
I'm able to compile a -arch x86_64 -arch arm64
build, but those are both 64-bit builds, so there's no type size mismatches.
Well, I have a few things to report.
• Adding -Duse64bitint did not help. I can’t imagine why not – as was pointed out, it ought to make things the same size internally. (Of course, even if it had worked, the resulting executable would not be transportable to lesser Macs – defeating a large part of the purpose of building a fat binary in the first place. It’s still bizarre that it didn’t work, of course.)
• The thousands of lines of warnings about a bit shift exceeding the width of the type actually, it turns out, also occur on a successful build; so I believe the hypothesis about it being due to the size difference between compiler runs is likely correct. I have set up and am currently running the suggested bisection to figure out what change made it start having actual build problems with the perceived mismatch.
This is a fast machine for its age but that age is 20 years. I will almost certainly not get answers from the bisection before tomorrow (Friday), and quite possibly not until Saturday.
Apparently I spoke too soon. That bisection program is genius! in 2602 seconds, it determined that the first commit to cause a failure was 1e3b3238f23137440041d8883e041e4da74876f5, dated March 13th 2024.
The command line I fed to bisect was:
../othergit/Porting/bisect.pl --test-build --target=miniperl --start=v5.38.0 --end=v5.40.0 -Dprefix=../built -Uvendorprefix= -Dperladmin=none -Duseshrplib -Duselargefiles -Dusenm --Dusethreads -Accflags='-DNO_MATHOMS -arch ppc -arch ppc64 -nostdinc -B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc -B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc -isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include -F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks' -Aldflags='-arch ppc -arch ppc64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.5.sdk'
I hope this all means something useful to someone.
I don't see how -Duse64bitint
could help. If you run Configure on a 64-bit platform, it will just see that sizeof (long) == 8
and use that, hardcoding #define IVTYPE long
in config.h
. Plus we have INTSIZE
, LONGSIZE
, SHORTSIZE
all hardcoded/configured in config.h
.
As far as I can tell, building for different architectures requires different configs.
Apparently I spoke too soon. That bisection program is genius! in 2602 seconds, it determined that the first commit to cause a failure was 1e3b323, dated March 13th 2024.
The command line I fed to bisect was:
../othergit/Porting/bisect.pl --test-build --target=miniperl --start=v5.38.0 --end=v5.40.0 -Dprefix=../built -Uvendorprefix= -Dperladmin=none -Duseshrplib -Duselargefiles -Dusenm --Dusethreads -Accflags='-DNO_MATHOMS -arch ppc -arch ppc64 -nostdinc -B/Developer/SDKs/MacOSX10.5.sdk/usr/include/gcc -B/Developer/SDKs/MacOSX10.5.sdk/usr/lib/gcc -isystem/Developer/SDKs/MacOSX10.5.sdk/usr/include -F/Developer/SDKs/MacOSX10.5.sdk/System/Library/Frameworks' -Aldflags='-arch ppc -arch ppc64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.5.sdk'
I hope this all means something useful to someone.
A number of points ...
Based on perldoc Porting/bisect-runner.pl
, I think you should have said -Dusethreads
(one hyphen) in the above rather than --Dusethreads
. However, I can't say that that made any difference in the results.
I have not previously seen both --test-build
and --target=miniperl
used in an invocation to bisect.pl
, so I'm slightly skeptical of that result. My understanding (which may be incorrect) is that with --test-build
we need to get as far as ./perl
for a PASS, whereas with --target=miniperl
we only need to get to ./miniperl
. Could you try a bisection using only the latter to see if it identifies the same commit as breaking? That would be:
perl Porting/bisect.pl \
-D[your other config options] \
--start=b9b8c7d2e8567b5c6652a643b4a44af22e06f2bc \
--end=4f872e99736a2242a86b234af32d603b84956352 \
--target=miniperl
Then try:
perl Porting/bisect.pl \
-D[your other config options] \
--start=b9b8c7d2e8567b5c6652a643b4a44af22e06f2bc \
--end=4f872e99736a2242a86b234af32d603b84956352 \
--test-build
Do you get the same results?
What would also be helpful is if you could determine whether the interaction of 1e3b3238f2 and one or more of your many config options caused the failure to build. Consider repeating the above bisection without any -D[config options]
and see whether even that fails to bulid.
perl Porting/bisect.pl \
--start=b9b8c7d2e8567b5c6652a643b4a44af22e06f2bc \
--end=4f872e99736a2242a86b234af32d603b84956352 \
--target=miniperl
Then add config options one at a time, starting with those that are not Mac-specific such as -Dusethreads
.
mauke, the Mac-specific aspects of this are a bit odd by everyone else's standards, because they build for all listed architectures simultaneously -- there is only one Configure run for ALL of them combined, not one per architecture as I understand might be done, for example, under Linux. That's why we thought -Duse64bitint might have made it build successfully.
jkeenan, in order: • you're correct, I ran that with one hyphen and accidentally typo'd when copying the line into my email. • --test-build and --target=xxx are required to both be used in this case. According to the documentation, without the first a separate test case must be specified (it gets run after the build succeeds, which is expected to happen every time, and would not here), and without the second, successfully completing the build is assumed (possibly the source of your misapprehension). When I tried running it using only --target=xxxx, it refused to run at all and merely gave me back the usage instructions (I hadn't specified a test case). • you were absolutely correct that I specified more options than were required. I am rerunning the bisection with no -D options at all except -Dprefix=xxxx (to prevent it overwriting my system Perl), and only those -A options listed as being necessary for a Mac-style multi-arch build (both of them, unfortunately, but if you look closely you'll see that nearly all the given components do nothing except tell the compiler where the system libraries are).
I should add that I had to restrict the test builds to only try as far as building miniperl, because various of the library modules require an assortment of trivial patches to build correctly under specific versions of Perl. Luckily the failure occurs during that early phase, so I did not need to muck about trying to tell it to apply a patch only during builds of certain versions.
they build for all listed architectures simultaneously -- there is only one Configure run for ALL of them combined, not one per architecture
The only way that could work is if IVTYPE = int64_t
and UVTYPE = uint64_t
, but there is no way to force Configure to choose those as far as I can see.
I won't pretend I understand how it works. The Mac compiler that was current at that time, and which I am using now, was a modified version of GCC 4.2.1. It could be given any number of “-arch xxxx” parameters and would then repeat each compilation with all of the other parameters the same, but a distinct target platform; then stitch the results into a universal binary using a tool with the amusing name “lipo” (because it was often used to slim a fat binary down to a single-architecture slice). The five platforms then current were “ppc” (32-bit PowerPC), “i386” (32-bit x86 – they were all lumped together as “i386” even though the cross-compiler, for example, had a prefix containing “686”), “arm” (32-bit ARM as used in early iPhones), “ppc64” (64-bit PowerPC, which consisted solely of the IBM PowerPC 970), and “x86_64” (exactly what it says). Of those, only two were even able to handle 64-bit data in a single action; yet Perl has historically been able to compile with 32- and 64-bit values (IVs, UVs, etc.) simultaneously. At first glance I'd have assumed it just compiled everything with 32-bit NVs, but Configure does in fact appear to take 64-bit platforms as 64-bit. I have no idea how it works but it did up until, as “bisect” has once again informed me, the same commit I named earlier.
Reverting that one commit against blead had no effect.
I don't see how
-Duse64bitint
could help. If you run Configure on a 64-bit platform, it will just see thatsizeof (long) == 8
and use that, hardcoding#define IVTYPE long
inconfig.h
. Plus we haveINTSIZE
,LONGSIZE
,SHORTSIZE
all hardcoded/configured inconfig.h
.As far as I can tell, building for different architectures requires different configs.
I remember building multiarch with i386 and x86_64 in one binary.
There were definitely some config issues, Configure has a darwin specific check to ensure alignbytes is at least 8 on darwin.
I never looked too hard at it.
I suspect this case isn't so much a new bug, but the static assert detecting an old bug.
I'm not actually certain there is a true bug, here. Is it plausible that the assert is framed in such a way that it gets a false positive from the disparity in word sizes between built-for architectures?
I'm not actually certain there is a true bug, here.
If that particular assertion fails the code following won't be valid., it may lose precision when converting from an NV to an IV but report an exact conversion.
Let me rephrase my supposition. I'm aware that's the purpose of the assert. The reason the assert is failing is that, on a 32-bit build, the total size of the NV is smaller than the (for a 64-bit build) reported size that is transferrable with full accuracy. (I think I got that straight, there are something like six different figures involved for three different quantities, or thereabouts.) The same figures being used for both 32- and 64-bit builds – which happen simultaneously in a universal binary – are, I believe, causing a "false positive" (false negative?) assertion failure.
The assert should be passing during the 64-bit build pass and incorrectly failing during the 32-bit build pass.
Description All Perls which I have built thus far will correctly build as NeXT-style fat (multi-architecture) binaries when the directions in
README.macosx
are followed – except the newest ones, beginning with version 5.40.0, which fail messily at themake
step (apparently, by applying the variable-size expectations appropriate to 64-bit sub-builds to the corresponding variables in 32-bit sub-builds). It generates thousands of lines of compiler warnings that a bit shift has exceeded the width of the data type, culminating in a fatal error involving a negative bit-field width.Steps to Reproduce Follow the instructions in
README.macosx
. Run./Configure
, thenmake
. You can’t runmake test
andmake install
because it errors out duringmake
.The
./Configure
line I used was:Expected behavior Perl should be constructed and installed as per usual, with all compiled code built as fat binaries in the manner normal for Macs.
Perl configuration The configuration cannot be extracted because Perl never finishes building, but the corresponding one for a pure ppc64 build looks like this:
(It should be noted that, in a pure ppc64 build done without the aid of a package manager,
-Alddlflags=xxxxx
must also be set by hand on the ./Configure command line, because extensions' shared-library makefiles fail to propagate the compiler flags that tell it which variant of the CPU architecture to target. Without that change, the individual .o files are still built correctly, but their coalescence into library .bundles is botched.)