conda-forge / numba-feedstock

A conda-smithy repository for numba.
BSD 3-Clause "New" or "Revised" License
0 stars 27 forks source link

Support TBB on non-Linux, skip PyPy #73

Closed henryiii closed 3 years ago

henryiii commented 3 years ago

Explicit skip for PyPy added, testing out including TBB (closes #43), and update the test line for TBB (closes #63). Also fixed the homepage link; GitHub will turn off the old redirects at some point.

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

henryiii commented 3 years ago

@conda-forge-admin, please rerender

henryiii commented 3 years ago

I think this is working. Can I get another pair of eyes or two?

henryiii commented 3 years ago

@stuartarchibald might be interested in verifying, too?

henryiii commented 3 years ago

The Drone aarch64 builds are randomly hanging. I'm going to restore the aarch64 limiting for now; macOS and Windows seem to be just fine. (might be a Drone issue, or might be TBB related).

henryiii commented 3 years ago

@conda-forge-admin, please rerender

henryiii commented 3 years ago

Ahh, looking in the history, it looks like Drone has been randomly hanging for a while, doesn't appear to have anything to do with the addition of TBB. I'll revert back to the original, tbb everywhere version.

henryiii commented 3 years ago

Yes, I think @jakirkham's analysis is correct. The higher versions of Python sometimes take 55 mins instead of 20 mins, which causes them to pass. 3.6 and 3.7 seem to be most vulnerable to failing with a 1 hour timeout. Is there some other infrastructure we can move to here? @chrisburr, do you know if there's an alternative to Drone like there is for Travis?

Another idea could be to skip running the tests on 3.6 and 3.7 for Drone, I think that frees up several minutes.

jakirkham commented 3 years ago

For context, I think Henry is referring to the discussion in issue ( https://github.com/conda-forge/numba-feedstock/issues/74 )

Am not aware of an alternative for aarch64 hardware. We can just restart on master. We could use emulated builds on Azure, but this may be slower and am a little hesitant to use emulation with something as sophisticated as Numba. Alternatively maybe we skip some tests on aarch64

cc @isuruf (in case you have other thoughts)

henryiii commented 3 years ago

We would only need to skip for py<38, since it seems the newer Pythons are a bit faster - the failures are almost always 3.6 and 3.7, and I can see 3.8 and 3.9 running "slow" (about 55 mins) sometimes, but they slip in under the limit usually.

Currently, it seems to only make it through 1/4 of the time or maybe less. I've restarted this about 6 times, always fails on one of 3.6, 3.7, and at least once failed with both.

isuruf commented 3 years ago

You can try cross compilation and running tests using QEMU, or full emulation. Those are the two choices right now. Skipping a python version wouldn't work, because it depends on which machine gets assigned in drone CI (They have different types of machines and some are slower) and even if this version passes just under 55 mins, a new version might not.

jakirkham commented 3 years ago

Alternatively maybe we skip some tests on aarch64

Just want to highlight this option again. For example do we need to run all of the TBB tests there?

Edit: Or could we just run them on Python 3.8 given that seems to work reliably and these tests are not likely too dependent on Python versions?

henryiii commented 3 years ago

even if this version passes just under 55 mins

It's possible Drone will update all their machines by the time the next version is released? Guessing the future of both the CI and the changes to Numba isn't very reliable. We can always change in the future if it fails on a new version.

We are currently, if I understand correctly, only running 20% of the tests on aarch64, maybe a first step would be to reduce that fraction? I wonder if that's why it's variable (3.6 and 3.7 do pass sometimes), it depends on which 20% it selects. Maybe there's a long test or two that can always be excluded?

henryiii commented 3 years ago

https://github.com/conda-forge/numba-feedstock/blob/9796deb403ad0986e959f44f9c62d8e81d961f3c/recipe/run_test.sh#L40-L41

stuartarchibald commented 3 years ago

even if this version passes just under 55 mins

It's possible Drone will update all their machines by the time the next version is released? Guessing the future of both the CI and the changes to Numba isn't very reliable. We can always change in the future if it fails on a new version.

We are currently, if I understand correctly, only running 20% of the tests on aarch64, maybe a first step would be to reduce that fraction? I wonder if that's why it's variable (3.6 and 3.7 do pass sometimes), it depends on which 20% it selects. Maybe there's a long test or two that can always be excluded?

IIRC for a given environment the selection will be consistent across runs.

stuartarchibald commented 3 years ago

Alternatively maybe we skip some tests on aarch64

Just want to highlight this option again. For example do we need to run all of the TBB tests there?

I think the really heavy threading layer specific tests should already be skipped by virtue of --exclude-tags='long_running' on the test runner line.

Another option, instead of picking a percentage of "random" tests, is to "slice" through the test suite with the slice syntax as used here: https://github.com/numba/numba/blob/5f7bcce048b9c0a7fd02376ca8a2e3574625c99f/buildscripts/incremental/test.sh#L84

-j "$TEST_START_INDEX,None,$TEST_COUNT" is interpreted as

run_these_tests = list_of_all_tests[slice($TEST_START_INDEX,None,$TEST_COUNT)]

and in the case of Numba's own CI, there's TEST_COUNT CI build combinations and each one starts at a linearly increasing index from 0..TEST_COUNT, i.e. bound through the test suite running every TEST_COUNT^th test starting from index TEST_START_INDEX. This is not great but realistically the only way public CI can cope with Numba's test suite.

More info here: https://numba.readthedocs.io/en/stable/developer/contributing.html#continuous-integration-testing

henryiii commented 3 years ago

10% of the tests passed on the first try. It cut the "long" time for Python 3.7 to 42-ish minutes, and 3.9 to 39 minutes. For Python 3.6 and 3.8, it ran on the newer hardware, taking 12.5 and <12 minutes, respectively. Almost tempted to try 15%. I think I will.

IIRC for a given environment the selection will be consistent across runs.

It could very well be, this was running at 59 minutes when it was not timing out, so it was easily possible the differences causing pass/fail were CI load related. If these are consistent, I think it's fine as long as they easily pass.

henryiii commented 3 years ago

Haha, of course, that's the one time in the last 10 runs or so that py36 and py37 both get a fast node. 🤦 18 mins for 3.6 & 3.7, and 17 mins for 3.8. Looks like 3.9 got a slow node, and it took 53 minutes. Guessing around 55 mins for 3.6 and 3.7. I think that's pretty good, let's go with this for now, we can always refer back here and modify if things get more challenging in the future (or less, too).

Thanks everyone!

jakirkham commented 3 years ago

Sounds great! Thanks for working on this Henry 😄