conda-forge / cupy-feedstock

A conda-smithy repository for cupy.
BSD 3-Clause "New" or "Revised" License
5 stars 23 forks source link

Switch to cross-compilation for aarch64/ppc64le #197

Closed conda-forge-admin closed 1 year ago

conda-forge-admin commented 1 year ago

Hi! This is the friendly automated conda-forge-webservice.

I've rerendered the recipe as instructed in #196.

Here's a checklist to do before merging.

Fixes #196

conda-forge-webservices[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

leofang commented 1 year ago

@conda-forge-admin, please rerender

github-actions[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the latest rerendering GitHub actions workflow run for errors. You can also ping conda-forge/core for further assistance or try re-rendering locally.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/cupy-feedstock/actions/runs/4684900434.

leofang commented 1 year ago

The rerendering error is weird, I tried a few different changes locally, but cannot make rerendering work. It seems like if the {{ compiler('cuda') }} is present, the rerendering would go wrong.

leofang commented 1 year ago

No idea why mamba wouldn't look for the linux-aarch64 platform... nvcc_linux-aarch64 is clearly there

+ conda mambabuild /home/conda/recipe_root -m /home/conda/feedstock_root/.ci_support/linux_aarch64_c_compiler_version10cuda_compiler_version11.2cxx_compiler_version10python3.10.____cpython.yaml --suppress-variables --no-test --clobber-file /home/conda/feedstock_root/.ci_support/clobber_linux_aarch64_c_compiler_version10cuda_compiler_version11.2cxx_compiler_version10python3.10.____cpython.yaml
Updating build index: /home/conda/feedstock_root/build_artifacts

No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.21
WARNING:conda_build.metadata:No numpy version specified in conda_build_config.yaml.  Falling back to default numpy value of 1.21
Adding in variants from internal_defaults
INFO:conda_build.variants:Adding in variants from internal_defaults
Adding in variants from /home/conda/recipe_root/conda_build_config.yaml
INFO:conda_build.variants:Adding in variants from /home/conda/recipe_root/conda_build_config.yaml
Adding in variants from /home/conda/feedstock_root/.ci_support/linux_aarch64_c_compiler_version10cuda_compiler_version11.2cxx_compiler_version10python3.10.____cpython.yaml
INFO:conda_build.variants:Adding in variants from /home/conda/feedstock_root/.ci_support/linux_aarch64_c_compiler_version10cuda_compiler_version11.2cxx_compiler_version10python3.10.____cpython.yaml
Attempting to finalize metadata for cupy
INFO:conda_build.metadata:Attempting to finalize metadata for cupy
conda-forge/noarch                                          Using cache
conda-forge/linux-aarch64                            8.3MB @   3.6MB/s  2.9s
Reloading output folder: /home/conda/feedstock_root/build_artifacts
file:///home/conda/feedstock_root/build_artifact..  ??.?MB @  ??.?MB/s 0 failed  0.0s
file:///home/conda/feedstock_root/build_artifact.. 127.0 B @   5.5MB/s  0.0s
conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
Reloading output folder: /home/conda/feedstock_root/build_artifacts
file:///home/conda/feedstock_root/build_artifact..  ??.?MB @  ??.?MB/s 0 failed  0.0s
file:///home/conda/feedstock_root/build_artifact.. 127.0 B @   5.3MB/s  0.0s
Mamba failed to solve:
 - nvcc_linux-aarch64 11.2.*
 - sysroot_linux-aarch64 2.17.*
 - python 3.10.* *_cpython
 - cross-python_linux-aarch64
 - gxx_linux-aarch64 10.*
 - gcc_linux-aarch64 10.*
 - cython

with channels:

The reported errors are:
- Encountered problems while solving:
-   - nothing provides requested nvcc_linux-aarch64 11.2.*
- 

Leaving build/test directories:
  Work:
 /home/conda/feedstock_root/build_artifacts/work 
  Test:
 /home/conda/feedstock_root/build_artifacts/test_tmp 
Leaving build/test environments:
  Test:
source activate  /home/conda/feedstock_root/build_artifacts/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_ 
  Build:
source activate  /home/conda/feedstock_root/build_artifacts/_build_env 

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 141, in mamba_get_install_actions
    solution = solver.solve_for_action(_specs, prefix)
  File "/opt/conda/lib/python3.10/site-packages/boa/core/solver.py", line 230, in solve_for_action
    t = self.solve(specs)
  File "/opt/conda/lib/python3.10/site-packages/boa/core/solver.py", line 220, in solve
    raise RuntimeError("Solver could not find solution." + error_string)
RuntimeError: Solver could not find solution.Mamba failed to solve:
 - nvcc_linux-aarch64 11.2.*
 - sysroot_linux-aarch64 2.17.*
 - python 3.10.* *_cpython
 - cross-python_linux-aarch64
 - gxx_linux-aarch64 10.*
 - gcc_linux-aarch64 10.*
 - cython

with channels:

The reported errors are:
- Encountered problems while solving:
-   - nothing provides requested nvcc_linux-aarch64 11.2.*
- 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/conda-mambabuild", line 10, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 256, in main
    call_conda_build(action, config)
  File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 228, in call_conda_build
    result = api.build(
  File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 180, in build
    return build_tree(
  File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3078, in build_tree
    packages_from_this = build(metadata, stats,
  File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2038, in build
    output_metas = expand_outputs([(m, need_source_download, need_reparse_in_env)])
  File "/opt/conda/lib/python3.10/site-packages/conda_build/render.py", line 787, in expand_outputs
    for (output_dict, m) in deepcopy(_m).get_output_metadata_set(permit_unsatisfiable_variants=False):
  File "/opt/conda/lib/python3.10/site-packages/conda_build/metadata.py", line 2524, in get_output_metadata_set
    conda_packages = finalize_outputs_pass(
  File "/opt/conda/lib/python3.10/site-packages/conda_build/metadata.py", line 884, in finalize_outputs_pass
    fm = finalize_metadata(
  File "/opt/conda/lib/python3.10/site-packages/conda_build/render.py", line 547, in finalize_metadata
    build_unsat, host_unsat = add_upstream_pins(m,
  File "/opt/conda/lib/python3.10/site-packages/conda_build/render.py", line 387, in add_upstream_pins
    build_deps, build_unsat, extra_run_specs_from_build = _read_upstream_pin_files(m, 'build',
  File "/opt/conda/lib/python3.10/site-packages/conda_build/render.py", line 374, in _read_upstream_pin_files
    deps, actions, unsat = get_env_dependencies(m, env, m.config.variant,
  File "/opt/conda/lib/python3.10/site-packages/conda_build/render.py", line 131, in get_env_dependencies
    actions = environ.get_install_actions(tmpdir, tuple(dependencies), env,
  File "/opt/conda/lib/python3.10/site-packages/boa/cli/mambabuild.py", line 150, in mamba_get_install_actions
    raise err
conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {MatchSpec("nvcc_linux-aarch64=11.2")}
leofang commented 1 year ago

@conda-forge-admin, please rerender

github-actions[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/cupy-feedstock/actions/runs/4696142986.

leofang commented 1 year ago

@conda-forge-admin, please restart ci

leofang commented 1 year ago

It seems the build_and_run function in CuPy's build system does not allow cross compilation. Not sure if there exists any easy patch. https://github.com/cupy/cupy/blob/d6252bed4a16aa606681ed97a4fafba2d728dfdb/install/cupy_builder/install_build.py#L717-L748

leofang commented 1 year ago

@conda-forge-admin, please rerender

github-actions[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/cupy-feedstock/actions/runs/4702805171.

leofang commented 1 year ago

It seems the build_and_run function in CuPy's build system does not allow cross compilation. Not sure if there exists any easy patch. https://github.com/cupy/cupy/blob/d6252bed4a16aa606681ed97a4fafba2d728dfdb/install/cupy_builder/install_build.py#L717-L748

@conda-forge/cupy I think everything works now. I've turned on artifact uploading, will ask RAPIDS folks for help testing next week. While waiting for CI to finish, let me explain my finding.

We need to patch build_and_run() as explained above. Ultimately, it needs to run in the build environment (in this case, linux-64), not the target environment (aarch64 or ppc64le). So, we need to use the right compiler, and not let CF's cross-python kick in to modify the compiler/linker flags, which was the first place I struggled the most. I ended up creating a fresh ccompiler object and only using it inside preconfigure_modules().

We happen to also need to patch build_shlib() only because both functions are called inside preconfigure_modules() and they see the same ccompiler object. We don't have to use the same ccompiler but I prefer to keep the patch simple. This is another place where I struggled with obscure error msgs.

We still need to use the CUDA 11.2 docker image to provide headers/stub libraries for the "build" (linux64), or native compiler. No need to change $CUDA_PATH (which points to /usr/local/cuda in the container). By doing so, we match the CUDA environment seen by both build/host environments.

Finally, when the Python extension modules are built, the "host" (aarch64/ppc64le), or cross, compiler kicks in. cross-python simply does the right job for us (I hope!)🤞

The caveat of this PR is that CF is currently set up to only support cross-compiling with CUDA 11.2 (and CUDA 12 once it's done). CUDA 11.0/11.1 users need to update. (Though v12.0.0 build number 0 does already support CUDA 11.0/11.1.) But given the relatively low download count on CUDA 11.0/11.1 + aarch64 (and we already disabled them for ppc64le) and that RAPIDS is on the latest CUDA 11.x, I don't think this is a concern.

Since building CuPy for aarch64/ppc64le is awfully time consuming using QEMU, and the GPU CI runner only offers the x86-64 platform, I strongly prefer to get this PR merged once we confirm it works. It also helps save the free CI resources and serves good intent for the whole community.

leofang commented 1 year ago

@conda-forge-admin, please rerender

leofang commented 1 year ago

I've turned on artifact uploading, will ask RAPIDS folks for help testing next week.

I forgot that I have access to aarch64 nodes, I just tested locally and everything looks fine. Will merge this by EOB Monday to allow time for feedbacks.

leofang commented 1 year ago

@conda-forge-admin, please rerender

leofang commented 1 year ago

Changed the target branch to cuda-11 since I'll be preparing for the CUDA 12 work (#199) and this is independent of that. Set automerge.

github-actions[bot] commented 1 year ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/cupy-feedstock/actions/runs/4728001858.

github-actions[bot] commented 1 year ago

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

Thus the PR was passing and merged! Have a great day!