Closed henryiii closed 4 years ago
Hi! This is the friendly automated conda-forge-linting service.
I just wanted to let you know that I linted all conda-recipes in your PR (recipe
) and found it was in an excellent condition.
Can you also include https://github.com/conda-forge/numba-feedstock/pull/38/files#diff-074883f7d957e193a6bcdd3c93e34b76 and rerender so that this also solves #38 ?
Sure, wasn't sure it would be picked up in the migration properly if I did.
The migration will continue once #38 is closed for some time.
For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb
is timing out. Is tbb still optional?
And for windows, llvmlite doesn't support 2.7 I believe.
For linux,
numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb
is timing out. Is tbb still optional?And for windows, llvmlite doesn't support 2.7 I believe.
According to the documentation it is optional https://github.com/numba/numba#dependencies
It is better described here actually http://numba.pydata.org/numba-doc/latest/user/installing.html?highlight=tbb#dependency-list
numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrent_jit_tbb
timed outtest_exercise_code_path_with_lifted_loop (numba.tests.test_annotations.TestAnnotation)
fails (all 3 OS's) (also warns about switching to Python mode a couple of times)The others pass.
Why does build.sh
use $TESTS_TO_RUN
when it is not defined/empty? Not related, just curious.
Is tbb still optional?
Yes, but we want to have it in the tests to make sure it works if installed.
- Python 3.6, 3.7 Linux:
numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrent_jit_tbb
timed out
This one is odd and probably needs some more investigation...
- Python 3.8:
test_exercise_code_path_with_lifted_loop (numba.tests.test_annotations.TestAnnotation)
fails (all 3 OS's) (also warns about switching to Python mode a couple of times)
This seems to be a failure that hadn't been caught by Numba's CI due to some unfortunate circumstances. I'll open a PR for Numba; not to fix it, but to make them aware of this.
- Python 2.7 on Windows fails to solve
IMHO, we can ignore this for now. If there is demand to get this working, someone can tackle this on the llvmlite-feedstock
(possibly with help from someone from Anaconda since the defaults
and numba
channels have builds for Windows and PY2).
Why does
build.sh
use$TESTS_TO_RUN
when it is not defined/empty? Not related, just curious.
This comes from https://github.com/numba/numba/commit/fed89e36e5579c3bb0b7921452e11cfc5751c99f:
[...] if the env var is not set it defaults to all tests.
I couldn't reproduce the tbb
-related timeouts locally. Maybe it was just a hiccup (though, strange it happens on two runs..) or something specific to the CI machine? Let's see if it happens again.
The previous timeouts were indeed only hiccups, apparently. This seems to be good to go and we can merge, IMHO, albeit with the following shortcomings:
re 1.: This is an upstream issue and possibly (probably?!) not that important that it warrants blocking Numba builds for Python 3.8 on conda-forge.
re 2.: We can either merge as is and just rerun the CI once Python 2.7 builds of llvm 0.31
are available on Windows, or we can add skip: True # [win and py=27]
. I'm fine with merging as-is.
(NB: numba 0.48
will likely remove Python 2.7 support, which can mean one of two things: 1. We could go ahead and drop those builds already for 0.47
because the "ship is sailed" anyway. 2. But there might be increased demand for the "last Numba version that supports Python 2.7". ¯\_(ツ)_/¯ )
The previous timeouts were indeed only hiccups, apparently.
The threading backend tests are compute heavy (runs many forking/threading/multiprocessing things along with the compiler and also threaded execution, all at the same time), Numba does not run them on public CI https://github.com/numba/numba/blob/88e3dad3d43a59c79e6295f72fc344d77f13330c/buildscripts/incremental/test.sh#L75 (--exclude-tags='long_running'
).
This seems to be good to go and we can merge, IMHO, albeit with the following shortcomings:
1. Annotations might not work as expected on Python 3.8.
The fail is because a loop didn't lift and the test was asserting that a loop did, annotations should work fine still. This is the sample code, with the annotations call added, runs fine and reports correctly:
from numba import jit
def bar(x):
return x
@jit
def foo(x):
h = 0.
for k in range(x):
h = h + k
if x:
h = h - bar(x)
return h
foo(10)
cres = foo.overloads[foo.signatures[0]]
ta = cres.type_annotation
with open("annotated.html", 'wt') as f:
ta.html_annotate(f)
will fix for 0.48.0.
2. Windows build for Python 2.7 fails.
re 1.: This is an upstream issue and possibly (probably?!) not that important that it warrants blocking Numba builds for Python 3.8 on conda-forge.
re 2.: We can either merge as is and just rerun the CI once Python 2.7 builds of
llvm 0.31
are available on Windows, or we can addskip: True # [win and py=27]
. I'm fine with merging as-is. (NB:numba 0.48
will likely remove Python 2.7 support, which can mean one of two things: 1. We could go ahead and drop those builds already for0.47
because the "ship is sailed" anyway. 2. But there might be increased demand for the "last Numba version that supports Python 2.7". ¯_(ツ)_/¯ )
For linux,
numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb
is timing out. Is tbb still optional?And for windows, llvmlite doesn't support 2.7 I believe.
TBB is optional, however if you build Numba without tbb-devel
then obviously at run time it cannot use TBB regardless of whether a tbb
package is present. Lack of TBB would lead to some parallel execution features being unavailable on certain platforms.
I assume this https://github.com/conda-forge/llvmlite-feedstock/pull/13 is where Python 2.7 got removed as a result of not being able to resolve these build problems: https://ci.appveyor.com/project/conda-forge/llvmlite-feedstock/builds/22085156/job/7vrdo4g30l9wwesn ? The errors there look like they could be from mixed tooling? Is perhaps conda-forge LLVM built with VS2017 and then llvmlite with VS2015?
The threading backend tests are compute heavy
Yeah, I figured the CI might've had a general slowdown and/or didn't handle the concurrency well. Locally, the two reported tests ran in about 2.5
and 3.5
seconds with all tests from numba.tests.test_parallel_backend.TestSpecificBackend
taking about 95.5
seconds in total (i.e., far lower than the 600
(540
) second timeout for a single test).
annotations should work fine still.
Great, thanks for taking a look at it!
Is perhaps conda-forge LLVM built with VS2017 and then llvmlite with VS2015?
I have very limited knowledge on that topic. If he's available, @isuruf would know, I guess.
llvmdev has - vs2017_{{ target_platform }} # [win]
, and llvmlite has
- {{ compiler('c') }} # [unix or (win and py==27)]
- {{ compiler('cxx') }} # [unix or (win and py==27)]
- vs2017_win-64 # [win64 and py>27]
- vs2017_win-32 # [win32 and py>27]
Would the simple thing (just matching llvmdev) work here, perhaps?
@henryiii, you may want to take a look at those two:
Those seem to additionally add vs2015_runtime
to the host (not build) requirements. As to why and what the details are, I can't say. But I'd just try to match those as a start.
I think we should drop Windows + 2.7 for now, then add it back if llvmlite gets a Windows 2.7 build in the future - I don't want to block this for 2.7 on Windows in 2020. We shouldn't even need to bump the build number, I believe (if that's all that changes).
At least one test needs to be restarted to avoid the timeout. Is there a way to increase that timeout?
The timeout seems to be 600 seconds / 10 minutes?
https://github.com/numba/numba/blob/0.47.0/numba/testing/main.py#L699-L700
Edit: I apparently can't read, this is listed above.
numba-feedstock (linux linux_python3.7)
needs to be restarted.
Thanks restarted.
@henryiii @mbargull, would either or both of you be interested in being maintainers here? 🙂
would either or both of you be interested in being maintainers here?
Sure, I'd be okay with it.
Great! Could you please add yourself to the maintainers' list in the recipe?
Let's see what @mbargull says, then I'll add either myself or both of us.
I'm especially interested in the result of his PR https://github.com/conda-forge/llvmlite-feedstock/pull/26, looks like that is building for Windows + 2.7.
Edit: Looks like that works, but would require a CFEP to be written in order to be merged (VS 17 + Python 2.7 is not allowed). So this cannot be provided for Windows + 2.7 unless it can be built (it == LLVM) with MSVC 2008 - which I assume is not likely.
Given the time the tests take on my local run, I'm really baffled that we are hitting that timeout... I'll add a workaround to increase the timeout. And yes, given my interest in the project, it would make sense to add me as a maintainer here. Will do.
Note that Numba CI does not run the tests that are timing out (--exclude-tags='long_running'
)
https://github.com/numba/numba/blob/709967b7a6ce6c036c70f5eb99b06a8ef8c7172d/buildscripts/incremental/test.sh#L73-L75 because they have prohibitively long run times on public CI hardware. The Numba build farm does run these tests.
There's a bug, that we (Numba core devs) have internally named "Frosty Thompson" after the docker container that first got stuck with it. Frosty Thompson's are basically where something goes wrong in TBB and it "gets stuck", we've been trying to pin it down for a while. It seems to happen most often on heavily loaded systems and always on linux. The place it "gets stuck" is here: https://github.com/intel/tbb/blob/18070344d755ece04d169e6cc40775cae9288cee/src/tbbmalloc/backend.cpp#L301-L327, essentially the loop invokes longer and longer pauses (and eventually ends up yielding) whilst waiting for a condition to change that never does. Needless to say we're trying to debug and get a reproducer for this. If longer timeouts still don't "fix" the failing test case(s) I'd be very suspicious that you've hit the first case of a Frosty Thompson outside the Numba build farm. It should also be noted that the test cases in numba.tests.test_parallel_backend
deliberately place a huge amount of strain on a system, there's concurrent compilation and execution (both single and multithreaded Numba code some of which has nested parallelism), Python based fork/forkserver/spawn/threading all happening at the same time!
The way in which 3.6/3.7/3.8 are all failing, different TBB tests timing out, suggests Frosty Thompson. CC @seibert.
If longer timeouts still don't "fix" the failing test case(s) I'd be very suspicious that you've hit the first case of a Frosty Thompson outside the Numba build farm.
I was thinking an additional 50% / 5 minutes timeout increase should really suffice in case things were only "slow". But I was also suspicious if some potential locking or contention issue may be at hand and whether we should just --exclude-tags='long_running'
here, too.
"Frosty Thompson" it is then :D. Thanks for your clarifications!
I've no idea if you can shell into the build containers, but if you can, Frosty Thompson symptoms include:
gdb
is attached to each of the child processes of the TBB tests, one will be stuck in that loop described above, most likely with a backtrace starting at sched_yield
syscall.https://github.com/conda-forge/llvmlite-feedstock/pull/26 is being blocked and I can't tell what will come of it. Merging this as is, i.e. without Py2.7 build on Windows. Thanks everyone.
Checklist
0
(if the version changed)conda-smithy
(Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)Closes #39
Also closes #38 by including the commit from that PR.
Uses latest llvmlite.