conda-forge / cupy-feedstock

A conda-smithy repository for cupy.
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

[DO NOT MERGE] Build cupy dev artifacts that allow testing with NumPy 2 #272

Closed seberg closed 1 month ago

seberg commented 4 months ago

This (tries to) build artifacts based on Evgeni's PR to CuPy which should have most fixes needed to run with NumPy 2 and thus allow testing downstream projects with NumPy 2 if they also need CuPy.

This includes most/all necessary Python fixes as well as fixes to make promotion align with NumPy (NEP 50).

See also:


(Note that I'll presumably will churn CI a bit before this actually works...)

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

seberg commented 4 months ago

@conda-forge-admin , please re-render

jakirkham commented 4 months ago

@conda-forge-admin , please re-render

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

conda-forge-webservices[bot] commented 4 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

jakirkham commented 4 months ago

Disabled more tests so that we can get the artifacts built for local testing

Also more details on using the artifacts from CI in issue: https://github.com/conda-forge/conda-forge.github.io/issues/1424

jakirkham commented 3 months ago

Now that there are CuPy 13.2.0 packages with NumPy 2.0 support ( https://github.com/conda-forge/cupy-feedstock/pull/275 ), is this still needed? Or can we close this out?

jakirkham commented 2 months ago

@conda-forge-admin , please re-render

conda-forge-webservices[bot] commented 2 months ago

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict. Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

seberg commented 2 months ago

@jakirkham I'll create a new PR today to test for the upcoming 13.x release. We can probably close this one (maybe I'll update it, too). I am not sure that it is too relevant for anyone (from the RAPIDS side, I think testing with 13.x is more interesting currently, the promotion changes seem too subtle to create serious problems/test failures).

EDIT: Ah, I see you already started working on switching to 13.x.dev, thanks!

conda-forge-webservices[bot] commented 2 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

jakirkham commented 2 months ago

@jakirkham I'll create a new PR today to test for the upcoming 13.x release

Ha! Was just about to ask for your help checking that we have the right things in here. Since you are already looking 😉

jakirkham commented 2 months ago

Should add it would be good to share these artifacts with folks working on CuPy + SciPy compatibility. They ran into a few issues as well and could benefit from fresh binaries with the patches already applied for testing (and seeing if anything else still remains)

Here are a couple issues that I came across recently:

Though you likely know better than me on how best to coordinate with them 🙂

jakirkham commented 2 months ago

Also we may want to restart whenever PR ( https://github.com/cupy/cupy/pull/8439 ) lands (and anything else I've forgotten about). Can just close and reopen the PR

seberg commented 2 months ago

Thanks, looks like this should work. Hopefully, that the last CuPy PR will auto-merge, without that cuml/cudf/cucim sanity testing might not make much sense.

Will adapt my cudf hack once that is done and share it over at SciPy (also I suspect that I can use cph extract or maybe even file:// without extracting, so will try that):

``` set -eu echo "fetching artifact" mkdir -p conda cd conda curl --output artifact.zip unzip artifact.zip rm artifact.zip mv conda_artifacts_* conda_artifact cd conda_artifact unzip *.zip rm *.zip cd ../.. ARTIFACT_PATH = `pwd`/conda/conda_artifact/build_artifacts # Use `file://$ARTIFACT_PATH` as channel. ```
seberg commented 2 months ago

Close/reopen to retrigger with the random seed fix (will take a while for new artifacts to be available).

jakirkham commented 2 months ago

Weird Windows isn't building. Am seeing errors like this on CI:

Exception: Your CUDA environment is invalid. Please check above error log.

It can't find DLPack, which is odd as the git submodule is cloned earlier

    dlpack    : No
      -> Include files not found: ['cupy/_dlpack/dlpack.h']
      -> Check your CFLAGS environment variable.
``` Cloning into '/d/bld/cupy-split_1721905523284/work'... done. checkout: 'v13' Switched to a new branch 'v13' branch 'v13' set up to track 'origin/v13'. Submodule 'third_party/cccl' (https://github.com/cupy/cccl.git) registered for path 'third_party/cccl' Submodule 'third_party/dlpack' (https://github.com/dmlc/dlpack.git) registered for path 'third_party/dlpack' Submodule 'third_party/jitify' (https://github.com/NVIDIA/jitify.git) registered for path 'third_party/jitify' Cloning into '/d/bld/cupy-split_1721905523284/work/third_party/cccl'... Cloning into '/d/bld/cupy-split_1721905523284/work/third_party/dlpack'... Cloning into '/d/bld/cupy-split_1721905523284/work/third_party/jitify'... Submodule path 'third_party/cccl': checked out '79ed0e96e35112d171e43f13fa7f324eff7f3de0' Submodule path 'third_party/dlpack': checked out '365b823cedb281cd0240ca601aba9b78771f91a3' Submodule path 'third_party/jitify': checked out '1a0ca0e837405506f3b8f7883bacb71c20d86d96' ```

Edit: Looks like upstream issue ( https://github.com/cupy/cupy/issues/7989 )

seberg commented 1 month ago

Close re-open once more, so we have newer CuPy 13.3.0 pre-release build available for testing (I'll probably run at least one sanity check with these, e.g. with cuml).

seberg commented 1 month ago

Hmmm, using this (after installing cudatoolkit-dev), the cuml test suite is looking pretty good, but I do see these failures (all in doctests):

``` =========================================================== FAILURES ============================================================ _______________________________________________ test_docstring[NearestNeighbors] ________________________________________________ docstring = @pytest.mark.parametrize( "docstring", _find_doctests_in_obj(cuml), ids=lambda docstring: docstring.name, ) def test_docstring(docstring): # We ignore differences in whitespace in the doctest output, and enable # the use of an ellipsis "..." to match any string in the doctest # output. An ellipsis is useful for, e.g., memory addresses or # imprecise floating point values. if docstring.name == "Handle": pytest.skip("Docstring is tested in RAFT.") optionflags = doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE runner = doctest.DocTestRunner(optionflags=optionflags) # These global names are pre-defined and can be used in doctests # without first importing them. globals = dict(cudf=cudf, np=np, cuml=cuml) docstring.globs = globals # Capture stdout and include failing outputs in the traceback. doctest_stdout = io.StringIO() with contextlib.redirect_stdout(doctest_stdout): runner.run(docstring) results = runner.summarize() try: > assert not results.failed, ( f"{results.failed} of {results.attempted} doctests failed for " f"{docstring.name}:\n{doctest_stdout.getvalue()}" ) E AssertionError: 1 of 9 doctests failed for NearestNeighbors: E ********************************************************************** E File "/nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/neighbors/nearest_neighbors.cpython-311-x86_64-linux-gnu.so", line ?, in NearestNeighbors E Failed example: E print(indices) E Expected: E 0 1 2 E 0 0 3 1 E 1 1 3 0 E 2 2 4 0 E 3 3 0 1 E 4 4 2 0 E Got: E 0 1 2 E 0 0 4 3 E 1 1 2 0 E 2 2 1 0 E 3 3 0 4 E 4 4 0 3 E ********************************************************************** E 1 items had failures: E 1 of 9 in NearestNeighbors E ***Test Failed*** 1 failures. E E assert not 1 E + where 1 = TestResults(failed=1, attempted=9).failed test_doctest.py:120: AssertionError ______________________________________________ test_docstring[make_classification] ______________________________________________ docstring = @pytest.mark.parametrize( "docstring", _find_doctests_in_obj(cuml), ids=lambda docstring: docstring.name, ) def test_docstring(docstring): # We ignore differences in whitespace in the doctest output, and enable # the use of an ellipsis "..." to match any string in the doctest # output. An ellipsis is useful for, e.g., memory addresses or # imprecise floating point values. if docstring.name == "Handle": pytest.skip("Docstring is tested in RAFT.") optionflags = doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE runner = doctest.DocTestRunner(optionflags=optionflags) # These global names are pre-defined and can be used in doctests # without first importing them. globals = dict(cudf=cudf, np=np, cuml=cuml) docstring.globs = globals # Capture stdout and include failing outputs in the traceback. doctest_stdout = io.StringIO() with contextlib.redirect_stdout(doctest_stdout): runner.run(docstring) results = runner.summarize() try: > assert not results.failed, ( f"{results.failed} of {results.attempted} doctests failed for " f"{docstring.name}:\n{doctest_stdout.getvalue()}" ) E AssertionError: 1 of 3 doctests failed for make_classification: E ********************************************************************** E File "/nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/datasets/classification.py", line 114, in make_classification E Failed example: E print(y) E Expected: E [1 0 1 1 1 1 1 1 1 0] E Got: E [1 1 1 1 1 1 1 0 1 0] E ********************************************************************** E 1 items had failures: E 1 of 3 in make_classification E ***Test Failed*** 1 failures. E E assert not 1 E + where 1 = TestResults(failed=1, attempted=3).failed test_doctest.py:120: AssertionError _____________________________________________ test_docstring[MulticlassClassifier] ______________________________________________ docstring = @pytest.mark.parametrize( "docstring", _find_doctests_in_obj(cuml), ids=lambda docstring: docstring.name, ) def test_docstring(docstring): # We ignore differences in whitespace in the doctest output, and enable # the use of an ellipsis "..." to match any string in the doctest # output. An ellipsis is useful for, e.g., memory addresses or # imprecise floating point values. if docstring.name == "Handle": pytest.skip("Docstring is tested in RAFT.") optionflags = doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE runner = doctest.DocTestRunner(optionflags=optionflags) # These global names are pre-defined and can be used in doctests # without first importing them. globals = dict(cudf=cudf, np=np, cuml=cuml) docstring.globs = globals # Capture stdout and include failing outputs in the traceback. doctest_stdout = io.StringIO() with contextlib.redirect_stdout(doctest_stdout): runner.run(docstring) results = runner.summarize() try: > assert not results.failed, ( f"{results.failed} of {results.attempted} doctests failed for " f"{docstring.name}:\n{doctest_stdout.getvalue()}" ) E AssertionError: 1 of 7 doctests failed for MulticlassClassifier: E ********************************************************************** E File "/nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/multiclass/multiclass.py", line 65, in MulticlassClassifier E Failed example: E cls.predict(X) E Expected: E array([1, 1, 1, 1, 1, 1, 2, 1, 1, 2]) E Got: E array([2, 2, 0, 2, 2, 1, 0, 1, 0, 1]) E ********************************************************************** E 1 items had failures: E 1 of 7 in MulticlassClassifier E ***Test Failed*** 1 failures. E E assert not 1 E + where 1 = TestResults(failed=1, attempted=7).failed test_doctest.py:120: AssertionError ______________________________________________ test_docstring[OneVsOneClassifier] _______________________________________________ docstring = @pytest.mark.parametrize( "docstring", _find_doctests_in_obj(cuml), ids=lambda docstring: docstring.name, ) def test_docstring(docstring): # We ignore differences in whitespace in the doctest output, and enable # the use of an ellipsis "..." to match any string in the doctest # output. An ellipsis is useful for, e.g., memory addresses or # imprecise floating point values. if docstring.name == "Handle": pytest.skip("Docstring is tested in RAFT.") optionflags = doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE runner = doctest.DocTestRunner(optionflags=optionflags) # These global names are pre-defined and can be used in doctests # without first importing them. globals = dict(cudf=cudf, np=np, cuml=cuml) docstring.globs = globals # Capture stdout and include failing outputs in the traceback. doctest_stdout = io.StringIO() with contextlib.redirect_stdout(doctest_stdout): runner.run(docstring) results = runner.summarize() try: > assert not results.failed, ( f"{results.failed} of {results.attempted} doctests failed for " f"{docstring.name}:\n{doctest_stdout.getvalue()}" ) E AssertionError: 1 of 7 doctests failed for OneVsOneClassifier: E ********************************************************************** E File "/nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/multiclass/multiclass.py", line 305, in OneVsOneClassifier E Failed example: E cls.predict(X) E Expected: E array([1, 1, 1, 1, 1, 1, 2, 1, 1, 2]) E Got: E array([2, 2, 0, 2, 2, 1, 0, 1, 0, 1]) E ********************************************************************** E 1 items had failures: E 1 of 7 in OneVsOneClassifier E ***Test Failed*** 1 failures. E E assert not 1 E + where 1 = TestResults(failed=1, attempted=7).failed test_doctest.py:120: AssertionError ______________________________________________ test_docstring[OneVsRestClassifier] ______________________________________________ docstring = @pytest.mark.parametrize( "docstring", _find_doctests_in_obj(cuml), ids=lambda docstring: docstring.name, ) def test_docstring(docstring): # We ignore differences in whitespace in the doctest output, and enable # the use of an ellipsis "..." to match any string in the doctest # output. An ellipsis is useful for, e.g., memory addresses or # imprecise floating point values. if docstring.name == "Handle": pytest.skip("Docstring is tested in RAFT.") optionflags = doctest.ELLIPSIS | doctest.NORMALIZE_WHITESPACE runner = doctest.DocTestRunner(optionflags=optionflags) # These global names are pre-defined and can be used in doctests # without first importing them. globals = dict(cudf=cudf, np=np, cuml=cuml) docstring.globs = globals # Capture stdout and include failing outputs in the traceback. doctest_stdout = io.StringIO() with contextlib.redirect_stdout(doctest_stdout): runner.run(docstring) results = runner.summarize() try: > assert not results.failed, ( f"{results.failed} of {results.attempted} doctests failed for " f"{docstring.name}:\n{doctest_stdout.getvalue()}" ) E AssertionError: 1 of 7 doctests failed for OneVsRestClassifier: E ********************************************************************** E File "/nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/multiclass/multiclass.py", line 230, in OneVsRestClassifier E Failed example: E cls.predict(X) E Expected: E array([1, 1, 1, 1, 1, 1, 2, 1, 1, 2]) E Got: E array([2, 2, 0, 2, 2, 1, 0, 1, 0, 1]) E ********************************************************************** E 1 items had failures: E 1 of 7 in OneVsRestClassifier E ***Test Failed*** 1 failures. E E assert not 1 E + where 1 = TestResults(failed=1, attempted=7).failed test_doctest.py:120: AssertionError ======================================================= warnings summary ======================================================== cuml/tests/test_doctest.py::test_docstring[KernelDensity] cuml/tests/test_doctest.py::test_docstring[KernelRidge] cuml/tests/test_doctest.py::test_docstring[PorterStemmer] /nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy. warn(NumbaPerformanceWarning(msg)) cuml/tests/test_doctest.py::test_docstring[KernelExplainer] cuml/tests/test_doctest.py::test_docstring[LinearRegression] cuml/tests/test_doctest.py::test_docstring[make_regression] /nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/internals/api_decorators.py:382: UserWarning: Starting from version 23.08, the new 'copy_X' parameter defaults to 'True', ensuring a copy of X is created after passing it to fit(), preventing any changes to the input, but with increased memory usage. This represents a change in behavior from previous versions. With `copy_X=False` a copy might still be created if necessary. Explicitly set 'copy_X' to either True or False to suppress this warning. return init_func(self, *args, **filtered_kwargs) cuml/tests/test_doctest.py::test_docstring[RandomForestRegressor] cuml/tests/test_doctest.py::test_docstring[TreeExplainer] /nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/internals/api_decorators.py:188: UserWarning: The number of bins, `n_bins` is greater than the number of samples used for training. Changing `n_bins` to number of training samples. ret = func(*args, **kwargs) cuml/tests/test_doctest.py::test_docstring[TreeExplainer] /nvme/1/sebastianb/miniforge3/envs/all_cuda-125_arch-x86_64/lib/python3.11/site-packages/cuml/internals/api_decorators.py:188: UserWarning: To use pickling first train using float32 data to fit the estimator ret = func(*args, **kwargs) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ==================================================== short test summary info ==================================================== FAILED test_doctest.py::test_docstring[NearestNeighbors] - AssertionError: 1 of 9 doctests failed for NearestNeighbors: FAILED test_doctest.py::test_docstring[make_classification] - AssertionError: 1 of 3 doctests failed for make_classification: FAILED test_doctest.py::test_docstring[MulticlassClassifier] - AssertionError: 1 of 7 doctests failed for MulticlassClassifier: FAILED test_doctest.py::test_docstring[OneVsOneClassifier] - AssertionError: 1 of 7 doctests failed for OneVsOneClassifier: FAILED test_doctest.py::test_docstring[OneVsRestClassifier] - AssertionError: 1 of 7 doctests failed for OneVsRestClassifier: ===================================== 5 failed, 68 passed, 1 skipped, 9 warnings in 18.17s ====================================== ```
seberg commented 1 month ago

OK those failures must be related to https://github.com/cupy/cupy/pull/8483 which changes the output stream of random.choice and that is used when making classification problems (also make_blobs).

I don't think this is worrying, CuPy team has to know if they mind modifying the stream like this (even if it is a huge improvement). For cuML, it seems fine to adjust/skip the tests as necessary.

jakirkham commented 1 month ago

@conda-forge-admin , please re-render

seberg commented 1 month ago

Yeah, not sure, but it seemed to me that cudatoolkit-dev is what made it work in the end... (I guess just checking whether it includes vector_types.h might make sense.)

jakirkham commented 1 month ago

So cudatoolkit-dev is a thin wrapper around the CUDA Toolkit runfile installer. That will install the full CTK onto the system in /usr/local/cuda

One could get an equivalent result with Conda packages by running conda install cuda=<x.y>, which also installs the full CTK (except for the driver)

While both of these will technically work, we don't want users to need to resort to this. After all we have the full CTK in conda packages. So just adding the right ones as dependencies to cupy should resolve the issue


As to vector_types.h, looked inside cuda-cudart-dev_linux-64 (as one example) and they do appear to be there

```bash ls targets/x86_64-linux/include builtin_types.h device_double_functions.h channel_descriptor.h device_functions.h common_functions.h device_launch_parameters.h cooperative_groups device_types.h cooperative_groups.h driver_functions.h cuComplex.h driver_types.h cuda.h host_config.h cudaEGL.h host_defines.h cudaEGLTypedefs.h library_types.h cudaGL.h math_constants.h cudaGLTypedefs.h math_functions.h cudaProfilerTypedefs.h mma.h cudaTypedefs.h nvfunctional cudaVDPAU.h sm_20_atomic_functions.h cudaVDPAUTypedefs.h sm_20_atomic_functions.hpp cuda_awbarrier.h sm_20_intrinsics.h cuda_awbarrier_helpers.h sm_20_intrinsics.hpp cuda_awbarrier_primitives.h sm_30_intrinsics.h cuda_bf16.h sm_30_intrinsics.hpp cuda_bf16.hpp sm_32_atomic_functions.h cuda_device_runtime_api.h sm_32_atomic_functions.hpp cuda_egl_interop.h sm_32_intrinsics.h cuda_fp16.h sm_32_intrinsics.hpp cuda_fp16.hpp sm_35_atomic_functions.h cuda_fp8.h sm_35_intrinsics.h cuda_fp8.hpp sm_60_atomic_functions.h cuda_gl_interop.h sm_60_atomic_functions.hpp cuda_occupancy.h sm_61_intrinsics.h cuda_pipeline.h sm_61_intrinsics.hpp cuda_pipeline_helpers.h surface_functions.h cuda_pipeline_primitives.h surface_indirect_functions.h cuda_runtime.h surface_types.h cuda_runtime_api.h texture_fetch_functions.h cuda_surface_types.h texture_indirect_functions.h cuda_texture_types.h texture_types.h cuda_vdpau_interop.h vector_functions.h cudart_platform.h vector_functions.hpp device_atomic_functions.h vector_types.h device_atomic_functions.hpp ```

However CuPy doesn't know how to look inside the targets structure at runtime (Leo recently added logic for this at build time). So think we need a patch similar to this one made for wheels also for conda packages: https://github.com/cupy/cupy/pull/8489

jakirkham commented 1 month ago

Upstream v13 now includes these fixes that we need:

So switched back to building with upstream's v13

jakirkham commented 1 month ago

Going to start fresh a CI build for the latest v13 branch

jakirkham commented 1 month ago

@conda-forge-admin , please restart CI

jakirkham commented 1 month ago

@conda-forge-admin , please restart CI

jakirkham commented 1 month ago

Looks like that is working! 🥳

Will retest with the latest changes to make sure that is still the case

jakirkham commented 1 month ago

@conda-forge-admin , please restart CI

seberg commented 1 month ago

Closing this, now that gh-282 is merged. If we need 14.0 pre-releases again, probably best to start fresh I think.

leofang commented 1 month ago

Thanks a lot for all the work, @seberg @jakirkham!