conda-forge / cupy-feedstock

A conda-smithy repository for cupy.
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

Rebuild for cudnn 9 -- i.e. drop cudnn altogether #286

Closed regro-cf-autotick-bot closed 3 days ago

regro-cf-autotick-bot commented 1 month ago

This PR has been triggered in an effort to update cudnn9.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/10628718403 - please use this URL for debugging.

conda-forge-webservices[bot] commented 1 month ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

jakirkham commented 1 month ago

Work likely would be needed upstream. That said, it looks more likely this will dropped as a dependency than upgraded

xref: https://github.com/cupy/cupy/issues/8215

hmaarrfk commented 3 weeks ago

@conda-forge-admin please restart cis

hmaarrfk commented 6 days ago

Given that the intention is to drop CUDNN altogether, might it be be acceptable to just "not use cudnn" at build time as it is suggested in:

https://github.com/cupy/cupy/issues/8633#issuecomment-2389460062

hmaarrfk commented 6 days ago

@conda-forge-admin please rerender

conda-forge-admin commented 6 days ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11227339497.

jakirkham commented 6 days ago

Originally the thinking was to support cuDNN until it was dropped

This is possible with cuDNN 8, but not cuDNN 9

Edit: It's worth noting cuDNN is already an optional dependency

xref: https://github.com/conda-forge/cupy-feedstock/pull/266

hmaarrfk commented 6 days ago

If this build passes, we could build CUDNN8 support with a higher build number.

Perhaps we can also wait a few weeks for a new "version" upstream, to more clearly communicate the "change of a feature set" at conda-forge.

jakirkham commented 6 days ago

Yeah was inclined to wait for CuPy 14

Though please let us know if there's more context behind your interest here (like some bug we need to be aware of)

hmaarrfk commented 6 days ago

The only thing i'm concerned about for "14" or the "next version" is that often it slips, and "old cupy" can "keep your packages on old versions" which makes it difficult to stay on the edge.

The Cupy14 release tracker seems stuck in April https://github.com/cupy/cupy/issues/8269 (again nothing wrong with that, just it is what it looks like).

jakirkham commented 6 days ago

Made a note in that issue: https://github.com/cupy/cupy/issues/8269#issuecomment-2398568635

Despite indications to the contrary, folks are hard at work on that release. The biggest item being NumPy 2 compatibility (meaning making CuPy look and act more like NumPy 2 in its API), which will involve quite a bit of work

hmaarrfk commented 6 days ago

Despite indications to the contrary, folks are hard at work on that release

I apologize. I had a hard time writing things on github. Sometimes typing doesn't evoke the complete meaning.

Even on my project I still pin to "1.26" on releases but test against "2.0" while the full bugs are identified.

I guess my main concern is that cupy is now one of the few remaining packages that isn't migrated to CUDNN9. https://conda-forge.org/status/migration/?name=cudnn9

Soon it will become increasingly harder to install along side pytorch.

jakirkham commented 6 days ago

Ok thanks for clarifying the issue. Can see how this would cause difficulties

How about we have one build with cuDNN 8 and one build without cuDNN altogether?

That way users still needing the cuDNN bits can get them. Though if cuDNN is not needed, they can pick up the CuPy build without cuDNN. Should add the solver should also be able to pickup the cuDNN free build when cuDNN 9 is in the environment

hmaarrfk commented 6 days ago

I'm honestly happy to take my own tips and recompile this for myself and upload it to my own channel. I can make the requested changes but having build number variants is generally confusing and leads to solver flip flopping.

hmaarrfk commented 5 days ago

I guess part of my team hit this on their dev machines.

It would be great if Icould get this to rerender, but it seems to be failing.

onda-forge/migrations/python312.yaml,/home/mark/.cache/conda-smithy/share/conda-forge/migrations/cudnn9.yaml
Traceback (most recent call last):
  File "/home/mark/miniforge3/bin/conda-smithy", line 10, in <module>
    sys.exit(main())
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/cli.py", line 758, in main
    args.subcommand_func(args)
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/cli.py", line 599, in __call__
    self._call(args, tmpdir)
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/cli.py", line 604, in _call
    configure_feedstock.main(
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/configure_feedstock.py", line 2792, in main
    render_azure(env, config, forge_dir, return_metadata=True)
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/configure_feedstock.py", line 1868, in render_azure
    return _render_ci_provider(
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/configure_feedstock.py", line 1023, in _render_ci_provider
    migrated_combined_variant_spec = migrate_combined_spec(
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/configure_feedstock.py", line 854, in migrate_combined_spec
    combined_spec = variant_add(combined_spec, migration)
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/variant_algebra.py", line 289, in variant_add
    return VARIANT_OP[operation](v1, v2)
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/variant_algebra.py", line 176, in op_variant_key_add
    new_keys = variant_key_set_union(
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/variant_algebra.py", line 114, in variant_key_set_union
    return sorted(out_v, key=partial(_version_order, ordering=ordering))
  File "/home/mark/miniforge3/lib/python3.10/site-packages/conda_smithy/variant_algebra.py", line 65, in _version_order
    return ordering.index(v)
ValueError: '12.0' is not in list
hmaarrfk commented 5 days ago

@conda-forge-admin please rerender

hmaarrfk commented 5 days ago

I guess having a disappearing dependency that is pinned is likely not supported by rerending....

conda-forge-admin commented 5 days ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11259072711.

leofang commented 4 days ago

Sorry for late reply. I am supportive to just drop the cuDNN support altogether to simplify things. I don't think anyone is using it through CuPy.

I feel shipping two builds (=doubling the support matrix) is a bit overwhelming. If backward compatibility is a concern, perhaps we can add cuDNN 9 as a run_constrained for the time being?

- run_constrained:
  - cudnn >=9

This way:

hmaarrfk commented 4 days ago

If backward compatibility is a concern

I do not believe i use CUDNN so it is not a concern to me. I feel like the original addition expansion was more for uniformity than anything else: https://github.com/conda-forge/cupy-feedstock/pull/266#issuecomment-2070441035

However, I, nor the bot, can rerender....

@conda-forge-admin please rerender

conda-forge-admin commented 4 days ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11275434758.

leofang commented 4 days ago

I can look into the rerender error later today, occupied atm 😥

hmaarrfk commented 4 days ago

@conda-forge-admin please rerender

conda-forge-admin commented 4 days ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the GitHub actions workflow below for more details. You can also ping conda-forge/core for further assistance or you can try rerendering locally.

The following suggestions might help debug any issues:

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11278006184.

hmaarrfk commented 4 days ago

The problem likely stems from the closing of the CUDA 12.0 migration and the fact that the cuda migrator is vendored.

https://github.com/regro-cf-autotick-bot/cupy-feedstock/commit/ac4c2d79a83de8d989882ebc818dfacda9c3e0c8

hmaarrfk commented 4 days ago

@conda-forge-admin please rerender

jakirkham commented 3 days ago

@conda-forge-admin , please re-render

conda-forge-admin commented 3 days ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/11286055117.

jakirkham commented 3 days ago

Fixed the re-rendering issues in PR: https://github.com/conda-forge/cupy-feedstock/pull/289

Have merged that into this PR and resolved conflicts

Note that we are using the vendored migrator to add the latest CUDA builds. Some minor tweaks to that migrator were all that was needed to get things to work

leofang commented 3 days ago

I can look into the rerender error later today, occupied atm 😥

Fixed the re-rendering issues in PR: #289

Looks like John took care of it for me (thx!)

Note that we are using the vendored migrator to add the latest CUDA builds.

From this and other PRs I understand you two have different opinions when it comes to resolving this kind of awkward rerendering issues (@jakirkham prefers to modify the vendored migrator, @hmaarrfk prefers to craft/maintain an explicit cbc.yaml file). I lean toward the latter but only slightly, mainly because having to manually modify anything under .ci_support is kinda hard to teach/debug. I am not the one doing heavylifting here, so I am happy to take whatever solution you guys decide to land on (and thx again!) 😄

jakirkham commented 3 days ago

For clarity the migrator was already handling the addition of CUDA 12.x. All I did was fix what was already used here

If we would like to discuss changes to that process, I would suggest that we raise a new issue to discuss it separately

AIUI this PR is focused on cuDNN handling. Let's keep this focused on that

hmaarrfk commented 3 days ago

So @jakirkham compiling with 12.6 still keeps it compatible with 12.0????

leofang commented 3 days ago

compiling with 12.6 still keeps it compatible with 12.0????

Yes, CuPy is taking advantage of CUDA minor version compatibility, so that we don't build packages per CUDA minor release. https://docs.nvidia.com/deploy/cuda-compatibility/#minor-version-compatibility

hmaarrfk commented 3 days ago

Great thanks. Merge?

leofang commented 3 days ago

For clarity the migrator was already handling the addition of CUDA 12.x. All I did was fix what was already used here

If we would like to discuss changes to that process, I would suggest that we raise a new issue to discuss it separately

AIUI this PR is focused on cuDNN handling. Let's keep this focused on that

@jakirkham Sorry if I am mistaken, but I am sensing some defensiveness here, and I feel obliged to explain (not trying to argue) why I raised the concern which I believe is relevant to general OSS development and this particular cuDNN-handling effort.

Throughout this PR, what I see is:

I think it is perfectly fine that a project maintainer has preference over certain approaches, but I would appreciate at least a reason or comment is given as to why the contributor's change is not favored. And that's why I thought it's good to point it out at least for the record.

Since Mark is happy, I am happy too and I approved the PR 🙂 Thanks @hmaarrfk @jakirkham!

hmaarrfk commented 2 days ago

I’m ok with letting this go.

I mostly don’t like editing .ci_support since that is often left for bot generated outputs.

If cupy is able to gain performance in 12.6 then I think we should try to extend this beyond this feedstock.

Let’s spend our time documenting that and implementing it

PS. I also feel like I am too liberal in using the Bots PR for a feedstock I don’t maintain. I should have just created my own.