Closed benswift closed 4 years ago
@digego when you built it on your VM the other day what was the (micro)arch? did it have AVX512?
sorry mate, deleted the vm after use
On Tue, Apr 14, 2020 at 10:37 AM Ben Swift notifications@github.com wrote:
@digego https://github.com/digego when you built it on your VM the other day what was the (micro)arch? did it have AVX512?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/digego/extempore/issues/378#issuecomment-613163802, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEHPKIVAKNBEXVDETRHHQDRMOV6JANCNFSM4MG34S7A .
all g, it was just to double-check anyway
Ok, so it does look like one of the attr
s is the issue [glances suspiciously at AVX512].
Using the same test file as in the other issue:
;; these all compile ok
(bind-func works_1
(lambda (inverted:i1)
(let ((rising (if inverted 1 0)))
(lambda ()
rising))))
(bind-func works_2
(lambda (inverted:i1)
(let ((rising (if inverted #t #f)))
rising)))
(bind-func works_3
(lambda (inverted:i1)
(lambda ()
(if inverted #t #f))))
;; this is broken
(bind-func broken
(lambda (inverted:i1)
(let ((rising (if inverted #t #f)))
(lambda ()
rising))))
Now, on my beefy xeon-y box, here are the results:
broken
crashes as before--attr=none
on startup, it works fineI think that I can try toggling individual attr
s using that CLI thing, so that might be the next step. Anyway, updating LLVM will solve all our problems and give us all ponies.
It's looking more likely that AVX512 on older LLVM is the culprit.
Will put together a workaround.
Ok, well it looks like this is “fixed” (worked around) in 3599d8b484253f8b29eb8681e0cbb8e8b24a4181.
We’ll have a proper fix (and be able to use avx512) when we update LLVM.
The new cross-platform automated testing is super-cool.
However, I noticed that
ubuntu-latest
was giving intermittent timeouts. Looking at the logs, it was hanging on trying to load a pre-compiledlibs/core/instruments.xtm
---the same places as I detailed here.That's weird, I thought. And then I looked at the logs more closely, and because the VMs are just drawn from a pool of runners they're not identical, hardware-wise.
Here's the Extempore startup banner from a test run which succeeded:
and here's the startup banner from one which failed:
Notice that the success is on a broadwell (
CPU: broadwell
) failure is on a knights landing (CPU: knl
). Also, the box I was having the trouble on the other day is alsoCPU: knl
. Interestingly, that same box dual-boots Windows, and it works fine there.It could be co-incidence, and I really need to go back and look at the LLVM debugging output listed in that other issue. However, the AVX512 attrs are certainly suspicious (turned on for knl, off for broadwell).
This is a bummer, firstly because it's broken, and secondly it means that our tests will randomly fail depending on the hardware they're assigned to (which we have no control over).
Bummer.