Closed Quuxplusone closed 4 years ago
Attached tc_corrprop.ll
(344 bytes, text/plain): reduced testcase
I think this was indeed what 4878aa3 was supposed to fix. Jonas, do you know why non of the public SystemZ bots caught the issue?
_This bug has been marked as a duplicate of bug 44949_
(In reply to Florian Hahn from comment #1)
> I think this was indeed what 4878aa3 was supposed to fix. Jonas, do you know
> why non of the public SystemZ bots caught the issue?
>
> *** This bug has been marked as a duplicate of bug 44949 ***
Ah, I see. Great that you fixed it :-)
I guess this didn't show up on SystemZ until the postgresql-12 package was
built (via clang bootstrapping) and llvm-lto crashed. Is there any
configuration in particular you have in mind?
I tried to cherry-pick your patch (4878aa3) onto llvm10-rc5, but it did not
apply cleanly. I think the Ubuntu release that ships with llvm-10 needs this
patch so I wonder what to do... Could you perhaps help with the merge since you
have worked with it?
There's already a bug to port it to 10.0.1 (PR45272) with a back ported patch
https://reviews.llvm.org/D76596.
> I guess this didn't show up on SystemZ until the postgresql-12 package was
built (via clang bootstrapping) and llvm-lto crashed. Is there any
configuration in particular you have in mind?
Are there ppc/s390 public bootstrap bots? If there are, I am curious why they
did not catch the problem. The issue was raised just a few days before the
final release and because the public ppc/390 bots seemed fine it was thought
not to be too critical.
(In reply to Florian Hahn from comment #3)
> There's already a bug to port it to 10.0.1 (PR45272) with a back ported
> patch https://reviews.llvm.org/D76596.
>
> > I guess this didn't show up on SystemZ until the postgresql-12 package was
> built (via clang bootstrapping) and llvm-lto crashed. Is there any
> configuration in particular you have in mind?
>
> Are there ppc/s390 public bootstrap bots? If there are, I am curious why
> they did not catch the problem. The issue was raised just a few days before
> the final release and because the public ppc/390 bots seemed fine it was
> thought not to be too critical.
I see a "multistage" buildbot, but looking at the logs I can't see -
DCLANG_ENABLE_BOOTSTRAP=ON being passed to cmake. I thought there were supposed
to be a bot for expensive checks and also for bootstrapping, but I am not
sure...
This is however not a SystemZ backend bug, so it could have shown up on any
machine, right? I just checked and it seems at least that a bootstrapped build
on that commit passes the SystemZ CodeGen tests. So to me it seems that rc5
builds just fine with a bootstrap, but on a particular input llvm-lto crashes.
That input was in this case an Ubuntu package, which is not part of any testing.
But it's a good point - I will try to set up some more testing involving a
bootstrapped clang shortly.
We are running a multistage build bot:
http://lab.llvm.org:8011/builders/clang-s390x-linux-multistage
which does indeed perform a two-stage clang build.
However, the second-stage clang is then only used to run the unit test suite,
and that may not be enough to expose the bug. As I understand, the symptom is
a crash in the bootstrapped clang executable, but that crash doesn't happen
always, but only when compiling certain input files. (In the original bug
report, this was some file in the postgres package.)
We also have a separate LNT build bot that runs all of test-suite, but this
doesn't use a bootstrapped clang, but just a regular stage one build. Maybe we
should combine the two? (Of course, it's not definite that even building all
of test-suite would have exposed the same bug as when building postgres ...)
(In reply to Ulrich Weigand from comment #5)
> ... a crash in the bootstrapped clang executable ...
Sorry, I mis-remembered. Jonas is correct, it was llvm-lto not clang.
(In reply to Ulrich Weigand from comment #5)
> We are running a multistage build bot:
> http://lab.llvm.org:8011/builders/clang-s390x-linux-multistage
> which does indeed perform a two-stage clang build.
>
> However, the second-stage clang is then only used to run the unit test
> suite, and that may not be enough to expose the bug. As I understand, the
> symptom is a crash in the bootstrapped clang executable, but that crash
> doesn't happen always, but only when compiling certain input files. (In the
> original bug report, this was some file in the postgres package.)
>
I think this issue would have showed up on a 3-stage bootstrap with LTO.
PR45272 mentioned that something bin BitcodeReader got mis-optimized which
should be used during LTO. Running a 3-stage bootstrap would ensure that the
stage-2 compiler is run on a larger codebase than just the unit tests, but such
a setup takes quite a long time to run :(
tc_corrprop.ll
(344 bytes, text/plain)