conda-forge / conda-smithy

The tool for managing conda-forge feedstocks.
https://conda-forge.org/
BSD 3-Clause "New" or "Revised" License
147 stars 177 forks source link

BUG: conda-smithy v3.31 mismatches cuda_compiler & image, breaks zip_keys #1847

Closed h-vetinari closed 6 months ago

h-vetinari commented 6 months ago

I'm getting the following rerender in https://github.com/conda-forge/arrow-cpp-feedstock/pull/1325, which mismatches the cuda_compiler with the image, and then predictably fails.

--- a/.azure-pipelines/azure-pipelines-linux.yml
+++ b/.azure-pipelines/azure-pipelines-linux.yml
@@ -11,7 +11,7 @@ jobs:
       linux_64_cuda_compiler_version11.2:                          # CUDA
         CONFIG: linux_64_cuda_compiler_version11.2
         UPLOAD_PACKAGES: 'True'
-        DOCKER_IMAGE: quay.io/condaforge/linux-anvil-cuda:11.2     # CUDA
+        DOCKER_IMAGE: quay.io/condaforge/linux-anvil-cos7-x86_64   # !! no CUDA !!
       linux_64_cuda_compiler_versionNone:
         CONFIG: linux_64_cuda_compiler_versionNone
         UPLOAD_PACKAGES: 'True'
--- a/.ci_support/linux_64_cuda_compiler_version11.2.yaml
+++ b/.ci_support/linux_64_cuda_compiler_version11.2.yaml           # CUDA
@@ -7,15 +7,15 @@ bzip2:
 cuda_compiler:
-- nvcc                                                             # CUDA
+- None                                                             # !! no CUDA !!
 cuda_compiler_version:
 - '11.2'                                                           # CUDA
 cuda_compiler_version_min:
@@ -23,23 +23,23 @@ cuda_compiler_version_min:
 docker_image:
-- quay.io/condaforge/linux-anvil-cuda:11.2
+- quay.io/condaforge/linux-anvil-cos7-x86_64
 gflags:
--- a/.ci_support/linux_64_cuda_compiler_versionNone.yaml
+++ b/.ci_support/linux_64_cuda_compiler_versionNone.yaml           # no CUDA
@@ -9,13 +9,13 @@ c_compiler:
 cuda_compiler:
-- None                                                             # no CUDA
+- nvcc                                                             # !! CUDA !!
 cuda_compiler_version:
 - None                                                             # no CUDA
 cuda_compiler_version_min:

In fact, for 4/4 builds on the arrow feedstock, the rerender randomly mismatches at least two jobs, seemingly by messing up the zip_keys. Not sure what's happening here or whether #1815 had a bug or uncovered another problem.

@mbargull @beckermr CC @conda-forge/core

beckermr commented 6 months ago

I'm away from my key board. Should be mark it broke? If so can you submit a PR?

h-vetinari commented 6 months ago

It depends a bit what's happening elsewhere and how widespread the breakage is. I'm leaning towards yes, however I'm not 100% that this wouldn't completely break rerendering, because smithy does not rely on the resolver for determining the newest available conda-smithy version, but queries the data directly somehow.

I remember hitting this a while ago (#1491), where rerenders wouldn't work anymore, because smithy would see the new build (even if broken), and fail the version check unless --no-check-uptodate is specified. I'm not sure if the conda-forge-admin bot uses that argument, but if it doesn't, marking 3.31 as broken might stop rerenders happening everywhere?

h-vetinari commented 6 months ago

Just tried jaxlib, where a local rerender also messed up python-vs-numpy versions. However, after deleting lines with cuda_compiler_version in the meta.yaml and rerendering, the result looked alright (aside from removing all the CUDA builds, obviously).

So at least for CUDA feedstocks, 3.31.0 seems to be pretty badly broken.

jakirkham commented 6 months ago

I remember hitting this a while ago (#1491), where rerenders wouldn't work anymore, because smithy would see the new build (even if broken), and fail the version check unless --no-check-uptodate is specified. I'm not sure if the conda-forge-admin bot uses that argument, but if it doesn't, marking 3.31 as broken might stop rerenders happening everywhere?

The re-rendering bot disables the up-to-date check

There are probably other places that need to disable update checks

That said, we may want to rethink how that check works for this reason. Opened issue: https://github.com/conda-forge/conda-smithy/issues/1848

jakirkham commented 6 months ago

In any event, we should feel comfortable marking it broken if it is not behaving and worry about fixing fallout afterwards

h-vetinari commented 6 months ago

Thank you! I opened https://github.com/conda-forge/admin-requests/pull/947

h-vetinari commented 6 months ago

As another corner-case that's broken in 3.31 (but probably too marginal for a separate issue?) - feedstocks that explicitly set conda_build_tool: mambabuild in conda-forge.yml (for whatever reason, example: ray), cannot even pass the CI setup because updating the conda installation runs into https://github.com/conda-forge/boa-feedstock/issues/79.

mbargull commented 6 months ago

The rendering times for arrow-cpp-feedstock with its multiple outputs is... something. Rendering https://github.com/conda-forge/arrow-cpp-feedstock/pull/1323/commits/fd8da32e98de34ba3225a2d7e3a064ababe64f9e took more than 10 minutes for me locally.

I've distilled this down to

package:
  name: test_conda_smithy_v3.31.0
  version: 1.0

build:
  skip: true  # [not (linux and x86_64) or cuda_compiler_version not in ("None", cuda_compiler_version_min)]

requirements:
  build:
    - {{ compiler("c") }}
    - {{ compiler("cuda") }}  # [cuda_compiler_version != "None"]

outputs:
  - name: test_conda_smithy_v3.31.0-output
    requirements:
      build:
        - {{ compiler("c") }}
        - {{ compiler("cuda") }}  # [cuda_compiler_version != "None"]

about: {}

which (indeterministically) fails after https://github.com/conda-forge/conda-smithy/pull/1815/commits/85482fdd1ccf0c1c8a087f926d7c37ee4d1cae63 .

I'll take a look later today -- if I can't figure this out right away, then we can just revert that commit for now. But we should add the above test case in any case, of course.


Thanks for reporting this quickly and handling the mark as broken stuff!

h-vetinari commented 6 months ago

Awesome that you reduced this, thanks for tackling this so quickly!

The rendering times for arrow-cpp-feedstock with its multiple outputs is... something. Rendering https://github.com/conda-forge/arrow-cpp-feedstock/commit/fd8da32e98de34ba3225a2d7e3a064ababe64f9e took more than 10 minutes for me locally.

Yeah... Local rerenders take up to 25min for me now - not fun. 😑

beckermr commented 6 months ago

PYTHONHASHSEED=4 will cause the error for those trying this out.

beckermr commented 6 months ago

PR #1851 appears to fix things but needs specific tests and some cleanup possibly

mbargull commented 6 months ago

PYTHONHASHSEED=4 will cause the error for those trying this out.

Must be something magical with that seed; IIRC I also used that one for gh-1815 :laughing: .

PR #1851 appears to fix things but needs specific tests and some cleanup possibly

Thanks very much for tackling this!

mbargull commented 6 months ago

FWIW, I tried your changes from gh-1851 on https://github.com/conda-forge/python-feedstock/pull/656/commits/73873d8de6b643f04ab08356d2da703dbcea552e (which failed with 3.30.4 but works with 3.31.0; i.e., the reason for gh-1815) and it still works fine for that case, too :+1:.

mbargull commented 6 months ago

The rendering times for arrow-cpp-feedstock with its multiple outputs is... something. Rendering conda-forge/arrow-cpp-feedstock@fd8da32 took more than 10 minutes for me locally.

Yeah... Local rerenders take up to 25min for me now - not fun. 😑

I've profiled this coarsely and quickly worked on some changes at the end of last month and but only got to put in https://github.com/conda/conda-build/issues/5224 today. (conda-build=24.3 release processes is starting now, so that would only be available in >=24.5.) With those changes, it is still very far from how quick it should be, but at least it should hopefully avoid >10 minute re-rendering times. Re-rendering arrow-cpp-feedstock went down to about 3 minutes and ctng-compilers-feedstock to about 4 minutes for me -- still very bad, IMO; but I had no idea how atrociously slow these things are currently. (Please do file bug reports for such things!)

beckermr commented 6 months ago

The compilers runs ~30 minutes on GHA so a factor of 4 would be a big improvement no matter what. Thank you!