Open Quuxplusone opened 3 years ago
Another example: http://45.33.8.238/mac/33435/step_11.txt
Thanks Nico. I should be able to look at this tomorrow or the next day (AEST).
If it's causing major disruption in the mean time I think it's reasonable to disable the test: This isn't heavily relied on at the moment.
If you do disable the test could you note the commit hash for that in this bug?
Here's another failure: http://45.33.8.238/mac/33879/step_11.txt
It doesn't fail all that often, every few days tops, so if you'll look into it soon I think it's fine to keep the test enabled until then.
Another failure today: http://45.33.8.238/mac/34774/step_11.txt
(I paste the ones I happen to notice; chances are it fails more frequently than I post comments :) )
And another: http://45.33.8.238/mac/35260/step_11.txt
lhames, given that this fails on other bots too, do you think it's time to disable the test for now?
Attached multiple-compile-threads-basic.crash
(4237 bytes, text/plain): crash log from local test failure
Got it!
The stub manager is being torn down before the worker threads, but they might (if the scheduling is unlucky) still be using it.
I'll fix up the teardown order and it should fix this.
-- Lang.
Oh, scratch that. The shutdown order probably is worth looking into, but I don't think it's the source of this issue. It looks like the locking operations were dropped from CompileOnDemandLayer during one of the rewrites. I think adding them back in will fix this.
I wasn't able to reproduce this locally a second time. I've committed 1ea8d12510b9e1b208a7541c86e1b02a9a3db0e2, which may fix this.
Please let me know if there are any more bot failures. The advantage of this failing more frequently (and on more bots) is that we can at least gain some confidence that the problem has been fixed if/when the failures disappear.
Nope, that wasn't it -- I just saw the same crash at llvm::orc::LocalIndirectStubsManager
rbp: 0x00008003e07cbb60 rsp: 0x00007ffee07cbb50
RBP looks bogus -- I guess that's why we're not getting a backtrace for thread 0.
Nico -- Have you seen any more failures due to this?
I haven't seen any locally, but I'm also 99.9% certain it hasn't been fixed. I think that moving other tests from XFAIL (with crashes, and the attendant load of the crash reporter) to Unsupported might have made this less likely to trigger.
I haven't seen this in a few weeks.
multiple-compile-threads-basic.crash
(4237 bytes, text/plain)