conda-forge / cupy-feedstock

A conda-smithy repository for cupy.
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

Rebuild for nccl_2_8_4_1 + Fix migrator conflict #102

Closed regro-cf-autotick-bot closed 3 years ago

regro-cf-autotick-bot commented 3 years ago

This PR has been triggered in an effort to update nccl_2_8_4_1.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.

This package has the following downstream children:

And potentially more.

If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. If you would like a local version of this bot, you might consider using rever. Rever is a tool for automating software releases and forms the backbone of the bot's conda-forge PRing capability. Rever is both conda (conda install -c conda-forge rever) and pip (pip install re-ver) installable. Finally, feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/autotick-bot/actions/runs/601037069, please use this URL for debugging

conda-forge-linter commented 3 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

leofang commented 3 years ago

This doesn't seem right. Will take a look tonight. Also, we'll need to do a manual migration for the rc branch.

leofang commented 3 years ago

@conda-forge-admin, please rerender

leofang commented 3 years ago

OK most of the build matrix is restored, but CUDA 9.2 is still missing, likely because after https://github.com/conda-forge/nccl-feedstock/pull/32 we no longer build NCCL for CUDA 9.2.

@jakirkham Is that an oversight, or because NCCL 2.8.x no longer supports CUDA 9.2? If it still supports, I think for all CUDA libraries (cuDNN, NCCL, cuTENSOR, etc) I expect to see them built for all CUDA versions, starting from 9.2. Downstream libraries are free to cut the build matrix (based on the migrator), but for "infrastructure" libraries they should be there for as long as possible, just like cudatoolkit. In this case I can send a PR to nccl-feedstock for spinning up 9.2 again.

leofang commented 3 years ago

Is that an oversight, or because NCCL 2.8.x no longer supports CUDA 9.2?

OK at least the official releases (https://developer.nvidia.com/nccl/nccl-legacy-downloads) cut 9.2 out for quite some time. Here is the last supported version list for older CUDAs (which are also set in CuPy's CI):

I will try pinning to these version pairs.

leofang commented 3 years ago

...Now that I think carefully, it's odd that only 9.2 is cut. The latest NCCL doesn't build for 10.0/10.1 either, but here they are.

jakirkham commented 3 years ago

It seems to have been produced

Screen Shot 2021-02-25 at 8 48 21 PM
leofang commented 3 years ago

Yeah it's out before the CUDA 111/112 migration that cut the build matrix.

jakirkham commented 3 years ago

Yep we lucked out in terms of PR ordering

Anyways it seems like they still include the CUDA 9 gencodes. So I don't think they dropped it (unless that is an oversight), but they may have stopped building binaries

leofang commented 3 years ago

I am debugging CBC locally and noticed something fishy. How come we generated the combination of cos6 + CUDA 11.0 here...It's invalid.

EDIT: I meant "cos6" + CUDA 11.0.

leofang commented 3 years ago

I can't make sense of what's wrong. We see the following symptoms:

These are irrelevant of this NCCL migration PR. I tested a simple rerender on the current master and it happens too. Apparently after the local CBC is applied, some other migrators follow and mess things up, but I can't tell which one makes trouble. Any advice @conda-forge/core?

leofang commented 3 years ago

OK I figured it out. Isuru's advice applies once again here: DO NOT USE (CUDA) MIGRATORS if managing the CBC by hand. In this case removing cuda110.yaml restores a sane state. Will fix it shortly.

kkraus14 commented 3 years ago

OK I figured it out. Isuru's advice applies once again here: DO NOT USE (CUDA) MIGRATORS if managing the CBC by hand. In this case removing cuda110.yaml restores a sane state. Will fix it shortly.

@leofang one note in that the nccl package doesn't have a cbc file so it's currently only being built for the CUDA versions in the global pinning. We may need to add the versions that are needed by CuPy and other packages.

leofang commented 3 years ago

one note in that the nccl package doesn't have a cbc file so it's currently only being built for the CUDA versions in the global pinning. We may need to add the versions that are needed by CuPy and other packages.

Thanks, Keith! I think John found that we have the version needed for CUDA 9.2 (https://github.com/conda-forge/cupy-feedstock/pull/102#issuecomment-786408949). Let's see if the CI is happy with it or not. 🙂

kkraus14 commented 3 years ago

That was an older build number which suffices for now, but once a new NCCL version is released and migrators are issued it will cause issues.

If you look at _2 builds here: https://anaconda.org/conda-forge/nccl/files?version=2.8.4.1 you'll see there's only 11.2, 11.1, 11.0, and 10.2 builds.

leofang commented 3 years ago

We may need to add the versions that are needed by CuPy and other packages.

That was an older build number which suffices for now, but once a new NCCL version is released and migrators are issued it will cause issues.

Yeah I see what you're saying. I think we have two solutions when this happens:

  1. Just drop CUDA 9.2/10.0/10.1 support
  2. Take my "infrastructure project" viewpoint and build the versions needed

If the CI is happy I can handle No.2 later; otherwise, I will send PRs to nccl/cudnn feedstocks to fix them first. (cuTENSOR is good because it just doesn't support older CUDA versions.)

kkraus14 commented 3 years ago

I believe cuDNN dropped support for older than CUDA 10.2 as well.

leofang commented 3 years ago

...right, my head is scrambled now 🤯 So in that case we should manually pin an older cudnn in the recipe here.

leofang commented 3 years ago

So in that case we should manually pin an older cudnn in the recipe here.

I think for cudnn it's alright, because we zip it with cuda versions and pin it loosely.

leofang commented 3 years ago

Looks like we'll need a migrator for cuDNN 8.1 + CUDA 10.2, but this can be done separately.

leofang commented 3 years ago

Merge this to unblock the migrator and version updates. Will handle any necessary changes in another PRs.

jakirkham commented 3 years ago

Thanks for working on this Leo and assisting Keith! 😄