Understanding build pins and de-duplication in Conda Build 3.0+

Lnaden commented 6 years ago

Short Version: The full combinatorial number of builds are queued for un-pinned, non-python packages, despite the documentation saying that un-pinned, non-python packages will be de-duplicated.

I manage the build platform for several packages and we are in the process of converting our build script and packages from Conda Build 2 to 3. I am trying to understand the build variant system based on the docs on the main conda docs page and use the new API (conda_build.api) in my build environment. As I understand from the general pinning example (item 2), a build script which does not pin non-python packages should not have multiple variants in that package.

e.g.., for the sample meta.yaml here in the directory "simple":

package:
  name: simple
  version: 1.0.0

requirements:
  build:
    - python
    - numpy

  run:
    - python
    - numpy

and using the following conda-build API:

import conda_build.api

variants = {'python': ['2.7', '3.5', '3.6'], 'numpy': ['1.10', '1.11', '1.12']}
metas = conda_build.api.render("simple", variants=variants)
len(metas)

Actual Behavior

The length of metas is now 9 long, so there will be 9 meta packages that try to build.

Expected Behavior

I should get 3 metas which would give 3 builds because the Python versions are implicitly pinned, and the NumPy versions should not be, based on the docs in the links above.

However, when I execute this command, I get 9 total builds for each combinatorial Python/NumPy pair. The same is true if I change the build block to a host block instead. I will also get 9 builds if I use the conda build command line interface as

conda build -m variants.yaml simple

where the variants.yaml file is the YAML dictionary of the variants variable above

To quote the doc page from above:

This example demonstrates a particular feature: reduction of builds when pins are unnecessary. Since the example recipe above only requires the Python API to numpy, we will only build the package once and the version of numpy will not be pinned at runtime to match the compile-time version.

Which seems to not be true for this example. Can anyone provide insight as to why the docs and the API/command line interface do not follow the docs, and more importantly, how to actually prevent duplicate builds?

Additional info:

There is a simple build.sh script in the simple directory which just does an echo, nothing actually installed
Behavior the same on OSX and Linux with Conda/Conda-build 4.5.0/3.8.0 and 4.5.5/3.10.9
Adding 'ignore_version': ['numpy'] to the variants dictionary does not change behavior
Removing the build/host block correctly reduces the builds, but that is not what the docs say should happen.
Behavior is the same if the 2nd package is something like pytest instead of numpy. Tested just in case it was an exception written in for numpy
Behavior does not change between render(..., finalize=True) or render(..., finalize=False)

Steps to Reproduce

Create a folder called simple
Copy YAML from above into meta.yaml
Run Python call from above

`Output of conda info`

$ conda info

     active environment : None
       user config file : /Users/nadenl/.condarc
 populated config files : /Users/nadenl/.condarc
          conda version : 4.5.8
    conda-build version : 3.10.9
         python version : 3.6.4.final.0
       base environment : /Users/nadenl/miniconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/omnia/osx-64
                          https://conda.anaconda.org/omnia/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/osx-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/pro/osx-64
                          https://repo.anaconda.com/pkgs/pro/noarch
          package cache : /Users/nadenl/miniconda3/pkgs
                          /Users/nadenl/.conda/pkgs
       envs directories : /Users/nadenl/miniconda3/envs
                          /Users/nadenl/.conda/envs
               platform : osx-64
             user-agent : conda/4.5.8 requests/2.18.4 CPython/3.6.4 Darwin/15.6.0 OSX/10.11.6
                UID:GID : 981070069:9696963
             netrc file : None
           offline mode : False

msarahan commented 6 years ago

I think the docs are out of date here. I tried really hard to get that behavior to work, but in the end, special-casing numpy was not tenable. The docs need updating.

A better pattern is to have only one value for numpy in conda_build_config.yaml - the oldest you can manage. We use 1.9 or 1.11, for example. Numpy is forward-compatible, so by building with an old version, we are compatible back to that, and through the current version. See https://github.com/AnacondaRecipes/scikit-learn-feedstock/blob/master/recipe/meta.yaml#L36

The forward compatibility is captured in this expression: https://github.com/AnacondaRecipes/scikit-learn-feedstock/blob/master/recipe/meta.yaml#L46

Conda-build currently computes hashes based on whether it thinks a variable is "used" in a recipe. That's either from explicit usage in templates or build scripts, or implicit usage by matching a variant key name with a dependency in the host or build section. The latter is why you are getting loops over numpy.

You could also hard-code a version constraint for numpy in the host deps, like we do at https://github.com/AnacondaRecipes/scikit-image-feedstock/blob/master/recipe/meta.yaml#L29

The main reason for hard-coding is if a recipe requires a newer numpy version than the other recipes in your collection. scikit-image required 1.11, but we had our conda_build_config.yaml setting for numpy as 1.9. Thus, the hard-coding overrode the variant and you would not get a loop.

Lnaden commented 6 years ago

Thanks for the explanation!

It was not really about numpy specifically and more about the de-duplication. I just chose python + some other package, I just picked numpy.

A better pattern is to have only one value for numpy in conda_build_config.yaml - the oldest you can manage.

Trying to understand a bit better on the pinning system, if I want to ensure that the packages I help deploy work on at least numpy 1.9 (for example), but don't want to restrict them to say >=1.9, <1.10 because they won't get updated that often and I don't want them to be locked to that version of Numpy. Would it then make sense to make a global variant of

{'numpy': '1.9', 'pin_run_as_build': {'numpy': {'min_pin': 'x.x'}}}

A follow up for that is I am also trying to convert all the recipes from the old build: and run: blocks to build:, host:, and run: in the requirements section. I'm pretty sure most the packages could easily move their build section to the host section, but then I don't think pin_run_as_build will map correctly, is there a pin_run_as_host or am I misinterpreting the behavior?

Thanks in advance!

msarahan commented 6 years ago

take a look again at https://github.com/AnacondaRecipes/scikit-learn-feedstock/blob/master/recipe/meta.yaml#L46

docs at https://conda.io/docs/user-guide/tasks/build-packages/variants.html#pinning-at-the-recipe-level

You could use pin_run_as_build for this purpose, too, but that's a little different: https://conda.io/docs/user-guide/tasks/build-packages/variants.html#pinning-at-the-variant-level

Lnaden commented 6 years ago

Okay, I will look at this some more and try locally.

Thanks for the help.