conda-forge / numba-feedstock

A conda-smithy repository for numba.
BSD 3-Clause "New" or "Revised" License
0 stars 27 forks source link

Version 0.47 with llvmlite 0.31 #41

Closed henryiii closed 4 years ago

henryiii commented 4 years ago

Checklist

Closes #39

Also closes #38 by including the commit from that PR.

Uses latest llvmlite.

conda-forge-linter commented 4 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

xhochy commented 4 years ago

Can you also include https://github.com/conda-forge/numba-feedstock/pull/38/files#diff-074883f7d957e193a6bcdd3c93e34b76 and rerender so that this also solves #38 ?

henryiii commented 4 years ago

Sure, wasn't sure it would be picked up in the migration properly if I did.

xhochy commented 4 years ago

The migration will continue once #38 is closed for some time.

henryiii commented 4 years ago

For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb is timing out. Is tbb still optional?

And for windows, llvmlite doesn't support 2.7 I believe.

marcelotrevisani commented 4 years ago

For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb is timing out. Is tbb still optional?

And for windows, llvmlite doesn't support 2.7 I believe.

According to the documentation it is optional https://github.com/numba/numba#dependencies

marcelotrevisani commented 4 years ago

It is better described here actually http://numba.pydata.org/numba-doc/latest/user/installing.html?highlight=tbb#dependency-list

henryiii commented 4 years ago

The others pass.

henryiii commented 4 years ago

Why does build.sh use $TESTS_TO_RUN when it is not defined/empty? Not related, just curious.

mbargull commented 4 years ago

Is tbb still optional?

Yes, but we want to have it in the tests to make sure it works if installed.

  • Python 3.6, 3.7 Linux: numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrent_jit_tbb timed out

This one is odd and probably needs some more investigation...

  • Python 3.8: test_exercise_code_path_with_lifted_loop (numba.tests.test_annotations.TestAnnotation) fails (all 3 OS's) (also warns about switching to Python mode a couple of times)

This seems to be a failure that hadn't been caught by Numba's CI due to some unfortunate circumstances. I'll open a PR for Numba; not to fix it, but to make them aware of this.

  • Python 2.7 on Windows fails to solve

IMHO, we can ignore this for now. If there is demand to get this working, someone can tackle this on the llvmlite-feedstock (possibly with help from someone from Anaconda since the defaults and numba channels have builds for Windows and PY2).

Why does build.sh use $TESTS_TO_RUN when it is not defined/empty? Not related, just curious.

This comes from https://github.com/numba/numba/commit/fed89e36e5579c3bb0b7921452e11cfc5751c99f:

[...] if the env var is not set it defaults to all tests.

mbargull commented 4 years ago

I couldn't reproduce the tbb-related timeouts locally. Maybe it was just a hiccup (though, strange it happens on two runs..) or something specific to the CI machine? Let's see if it happens again.

mbargull commented 4 years ago

The previous timeouts were indeed only hiccups, apparently. This seems to be good to go and we can merge, IMHO, albeit with the following shortcomings:

  1. Annotations might not work as expected on Python 3.8.
  2. Windows build for Python 2.7 fails.

re 1.: This is an upstream issue and possibly (probably?!) not that important that it warrants blocking Numba builds for Python 3.8 on conda-forge.

re 2.: We can either merge as is and just rerun the CI once Python 2.7 builds of llvm 0.31 are available on Windows, or we can add skip: True # [win and py=27]. I'm fine with merging as-is. (NB: numba 0.48 will likely remove Python 2.7 support, which can mean one of two things: 1. We could go ahead and drop those builds already for 0.47 because the "ship is sailed" anyway. 2. But there might be increased demand for the "last Numba version that supports Python 2.7". ¯\_(ツ)_/¯ )

stuartarchibald commented 4 years ago

The previous timeouts were indeed only hiccups, apparently.

The threading backend tests are compute heavy (runs many forking/threading/multiprocessing things along with the compiler and also threaded execution, all at the same time), Numba does not run them on public CI https://github.com/numba/numba/blob/88e3dad3d43a59c79e6295f72fc344d77f13330c/buildscripts/incremental/test.sh#L75 (--exclude-tags='long_running').

This seems to be good to go and we can merge, IMHO, albeit with the following shortcomings:

1. Annotations might not work as expected on Python 3.8.

The fail is because a loop didn't lift and the test was asserting that a loop did, annotations should work fine still. This is the sample code, with the annotations call added, runs fine and reports correctly:

from numba import jit

def bar(x):
    return x

@jit
def foo(x):
    h = 0.
    for k in range(x):
        h = h + k
    if x:
        h = h - bar(x)
    return h

foo(10)

cres = foo.overloads[foo.signatures[0]]
ta = cres.type_annotation
with open("annotated.html", 'wt') as f:
    ta.html_annotate(f)

will fix for 0.48.0.

2. Windows build for Python 2.7 fails.

re 1.: This is an upstream issue and possibly (probably?!) not that important that it warrants blocking Numba builds for Python 3.8 on conda-forge.

re 2.: We can either merge as is and just rerun the CI once Python 2.7 builds of llvm 0.31 are available on Windows, or we can add skip: True # [win and py=27]. I'm fine with merging as-is. (NB: numba 0.48 will likely remove Python 2.7 support, which can mean one of two things: 1. We could go ahead and drop those builds already for 0.47 because the "ship is sailed" anyway. 2. But there might be increased demand for the "last Numba version that supports Python 2.7". ¯_(ツ)_/¯ )

stuartarchibald commented 4 years ago

For linux, numba.tests.test_parallel_backend.TestSpecificBackend.test_random_concurrect_vectorize_tbb is timing out. Is tbb still optional?

And for windows, llvmlite doesn't support 2.7 I believe.

TBB is optional, however if you build Numba without tbb-devel then obviously at run time it cannot use TBB regardless of whether a tbb package is present. Lack of TBB would lead to some parallel execution features being unavailable on certain platforms.

stuartarchibald commented 4 years ago

I assume this https://github.com/conda-forge/llvmlite-feedstock/pull/13 is where Python 2.7 got removed as a result of not being able to resolve these build problems: https://ci.appveyor.com/project/conda-forge/llvmlite-feedstock/builds/22085156/job/7vrdo4g30l9wwesn ? The errors there look like they could be from mixed tooling? Is perhaps conda-forge LLVM built with VS2017 and then llvmlite with VS2015?

mbargull commented 4 years ago

The threading backend tests are compute heavy

Yeah, I figured the CI might've had a general slowdown and/or didn't handle the concurrency well. Locally, the two reported tests ran in about 2.5 and 3.5 seconds with all tests from numba.tests.test_parallel_backend.TestSpecificBackend taking about 95.5 seconds in total (i.e., far lower than the 600 (540) second timeout for a single test).

annotations should work fine still.

Great, thanks for taking a look at it!

Is perhaps conda-forge LLVM built with VS2017 and then llvmlite with VS2015?

I have very limited knowledge on that topic. If he's available, @isuruf would know, I guess.

henryiii commented 4 years ago

llvmdev has - vs2017_{{ target_platform }} # [win], and llvmlite has

    - {{ compiler('c') }}    # [unix or (win and py==27)]
    - {{ compiler('cxx') }}  # [unix or (win and py==27)]
    - vs2017_win-64          # [win64 and py>27]
    - vs2017_win-32          # [win32 and py>27]

Would the simple thing (just matching llvmdev) work here, perhaps?

mbargull commented 4 years ago

@henryiii, you may want to take a look at those two:

Those seem to additionally add vs2015_runtime to the host (not build) requirements. As to why and what the details are, I can't say. But I'd just try to match those as a start.

henryiii commented 4 years ago

I think we should drop Windows + 2.7 for now, then add it back if llvmlite gets a Windows 2.7 build in the future - I don't want to block this for 2.7 on Windows in 2020. We shouldn't even need to bump the build number, I believe (if that's all that changes).

henryiii commented 4 years ago

At least one test needs to be restarted to avoid the timeout. Is there a way to increase that timeout?

henryiii commented 4 years ago

The timeout seems to be 600 seconds / 10 minutes?

https://github.com/numba/numba/blob/0.47.0/numba/testing/main.py#L699-L700

Edit: I apparently can't read, this is listed above.

henryiii commented 4 years ago

numba-feedstock (linux linux_python3.7) needs to be restarted.

jakirkham commented 4 years ago

Thanks restarted.

jakirkham commented 4 years ago

@henryiii @mbargull, would either or both of you be interested in being maintainers here? 🙂

henryiii commented 4 years ago

would either or both of you be interested in being maintainers here?

Sure, I'd be okay with it.

jakirkham commented 4 years ago

Great! Could you please add yourself to the maintainers' list in the recipe?

henryiii commented 4 years ago

Let's see what @mbargull says, then I'll add either myself or both of us.

I'm especially interested in the result of his PR https://github.com/conda-forge/llvmlite-feedstock/pull/26, looks like that is building for Windows + 2.7.

Edit: Looks like that works, but would require a CFEP to be written in order to be merged (VS 17 + Python 2.7 is not allowed). So this cannot be provided for Windows + 2.7 unless it can be built (it == LLVM) with MSVC 2008 - which I assume is not likely.

mbargull commented 4 years ago

Given the time the tests take on my local run, I'm really baffled that we are hitting that timeout... I'll add a workaround to increase the timeout. And yes, given my interest in the project, it would make sense to add me as a maintainer here. Will do.

stuartarchibald commented 4 years ago

Note that Numba CI does not run the tests that are timing out (--exclude-tags='long_running') https://github.com/numba/numba/blob/709967b7a6ce6c036c70f5eb99b06a8ef8c7172d/buildscripts/incremental/test.sh#L73-L75 because they have prohibitively long run times on public CI hardware. The Numba build farm does run these tests.

stuartarchibald commented 4 years ago

There's a bug, that we (Numba core devs) have internally named "Frosty Thompson" after the docker container that first got stuck with it. Frosty Thompson's are basically where something goes wrong in TBB and it "gets stuck", we've been trying to pin it down for a while. It seems to happen most often on heavily loaded systems and always on linux. The place it "gets stuck" is here: https://github.com/intel/tbb/blob/18070344d755ece04d169e6cc40775cae9288cee/src/tbbmalloc/backend.cpp#L301-L327, essentially the loop invokes longer and longer pauses (and eventually ends up yielding) whilst waiting for a condition to change that never does. Needless to say we're trying to debug and get a reproducer for this. If longer timeouts still don't "fix" the failing test case(s) I'd be very suspicious that you've hit the first case of a Frosty Thompson outside the Numba build farm. It should also be noted that the test cases in numba.tests.test_parallel_backend deliberately place a huge amount of strain on a system, there's concurrent compilation and execution (both single and multithreaded Numba code some of which has nested parallelism), Python based fork/forkserver/spawn/threading all happening at the same time!

stuartarchibald commented 4 years ago

The way in which 3.6/3.7/3.8 are all failing, different TBB tests timing out, suggests Frosty Thompson. CC @seibert.

mbargull commented 4 years ago

If longer timeouts still don't "fix" the failing test case(s) I'd be very suspicious that you've hit the first case of a Frosty Thompson outside the Numba build farm.

I was thinking an additional 50% / 5 minutes timeout increase should really suffice in case things were only "slow". But I was also suspicious if some potential locking or contention issue may be at hand and whether we should just --exclude-tags='long_running' here, too.

"Frosty Thompson" it is then :D. Thanks for your clarifications!

stuartarchibald commented 4 years ago

I've no idea if you can shell into the build containers, but if you can, Frosty Thompson symptoms include:

mbargull commented 4 years ago

https://github.com/conda-forge/llvmlite-feedstock/pull/26 is being blocked and I can't tell what will come of it. Merging this as is, i.e. without Py2.7 build on Windows. Thanks everyone.