Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll is flaky #50033

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR51064
Status NEW
Importance P enhancement
Reported by Nico Weber (nicolasweber@gmx.de)
Reported on 2021-07-12 08:47:04 -0700
Last modified on 2021-11-22 18:31:18 -0800
Version trunk
Hardware PC Windows NT
CC 1101.debian@gmail.com, jroelofs@jroelofs.com, lhames@gmail.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments multiple-compile-threads-basic.crash (4237 bytes, text/plain)
Blocks
Blocked by
See also
I saw it fail twice recently, but didn't see it fail ever before. So this might
be a new thing.

http://45.33.8.238/mac/33411/step_11.txt , at rev f192616ce983, 2021 Jun 12

https://logs.chromium.org/logs/chromium/buildbucket/cr-
buildbucket.appspot.com/8842029280494205536/+/u/package_clang/stdout?format=raw
, at rev d5c0b9c84886 , 2021 Jun 11

  Some tests will be skipped and the --timeout command line argument will not work.
 -- Testing: 72841 tests, 12 workers --
 Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60..
 FAIL: LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll (49812 of 72841)
 ******************** TEST 'LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll' FAILED ********************
 Script:
 --
 : 'RUN: at line 1';   /opt/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /opt/s/w/ir/cache/builder/src/third_party/llvm/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll | /opt/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/bin/FileCheck /opt/s/w/ir/cache/builder/src/third_party/llvm/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
 --
 Exit Code: 2

 Command Output (stderr):
 --
 FileCheck error: '<stdin>' is empty.
 FileCheck command line:  /opt/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/bin/FileCheck /opt/s/w/ir/cache/builder/src/third_party/llvm/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
Quuxplusone commented 3 years ago

Another example: http://45.33.8.238/mac/33435/step_11.txt

Quuxplusone commented 3 years ago

Thanks Nico. I should be able to look at this tomorrow or the next day (AEST).

If it's causing major disruption in the mean time I think it's reasonable to disable the test: This isn't heavily relied on at the moment.

If you do disable the test could you note the commit hash for that in this bug?

Quuxplusone commented 3 years ago

Here's another failure: http://45.33.8.238/mac/33879/step_11.txt

It doesn't fail all that often, every few days tops, so if you'll look into it soon I think it's fine to keep the test enabled until then.

Quuxplusone commented 3 years ago

Another failure today: http://45.33.8.238/mac/34774/step_11.txt

(I paste the ones I happen to notice; chances are it fails more frequently than I post comments :) )

Quuxplusone commented 3 years ago

And another: http://45.33.8.238/mac/35260/step_11.txt

Quuxplusone commented 3 years ago

Another: http://45.33.8.238/mac/35299/step_11.txt

Quuxplusone commented 3 years ago

Another: https://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/23660/console

Quuxplusone commented 3 years ago

lhames, given that this fails on other bots too, do you think it's time to disable the test for now?

Quuxplusone commented 2 years ago

Attached multiple-compile-threads-basic.crash (4237 bytes, text/plain): crash log from local test failure

Quuxplusone commented 2 years ago

Got it!

The stub manager is being torn down before the worker threads, but they might (if the scheduling is unlucky) still be using it.

I'll fix up the teardown order and it should fix this.

-- Lang.

Quuxplusone commented 2 years ago

Oh, scratch that. The shutdown order probably is worth looking into, but I don't think it's the source of this issue. It looks like the locking operations were dropped from CompileOnDemandLayer during one of the rewrites. I think adding them back in will fix this.

Quuxplusone commented 2 years ago

I wasn't able to reproduce this locally a second time. I've committed 1ea8d12510b9e1b208a7541c86e1b02a9a3db0e2, which may fix this.

Please let me know if there are any more bot failures. The advantage of this failing more frequently (and on more bots) is that we can at least gain some confidence that the problem has been fixed if/when the failures disappear.

Quuxplusone commented 2 years ago

Nope, that wasn't it -- I just saw the same crash at llvm::orc::LocalIndirectStubsManager::~LocalIndirectStubsManager() + 28 (IndirectionUtils.h:363).

Quuxplusone commented 2 years ago

rbp: 0x00008003e07cbb60 rsp: 0x00007ffee07cbb50

RBP looks bogus -- I guess that's why we're not getting a backtrace for thread 0.

Quuxplusone commented 2 years ago

Nico -- Have you seen any more failures due to this?

I haven't seen any locally, but I'm also 99.9% certain it hasn't been fixed. I think that moving other tests from XFAIL (with crashes, and the attendant load of the crash reporter) to Unsupported might have made this less likely to trigger.

Quuxplusone commented 2 years ago

I haven't seen this in a few weeks.