conda-forge / unidist-packages-feedstock

A conda-smithy repository for unidist-packages.
BSD 3-Clause "New" or "Revised" License
0 stars 3 forks source link

Use conda_build_config.yaml to fix builds #11

Closed YarShev closed 9 months ago

YarShev commented 9 months ago

Checklist

conda-forge-webservices[bot] commented 9 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

YarShev commented 9 months ago

@conda-forge-admin, please rerender

dalcinl commented 9 months ago

There is an issue that I don't quite know how to address. We want two variants (mpich and openmpi) of the unidist-mpi package, but only one of all the others, as they are unrelated to MPI, and you want to get all that out of a single recipe with various outputs. I'm not quite sure how to use conda-build-config.yaml in this case. If we set mpi = [mpich, openmpi], then I believe conda build will try to duplicate the build for all the other non-mpi packages.

dalcinl commented 9 months ago

Maybe you have to do it as per the following sketch?

{% set build = 2 %}

outputs:

  - name: unidist-mpi
    build:
      noarch: python
      number: {{ build * 100 }}  # do that this variant has higher priority
    requirements:
      host:
        run:
          - mpich  # [not win]
          - msmpi  # [win]

  - name: unidist-mpi  # note repeated name, but different requirements
    build:
      noarch: python
      number: {{ build }}
    requirements:
      host:
        run:
          - openmpi

Definitely not nice, as there is a lot of duplicated code. I'm not even sure if that will work. If not, you sould ask in the conda-forge gitter chat about how to make a single output build with two or more different variants. PS:

YarShev commented 9 months ago

@dalcinl, yep, this looks a little bit clumsy. Even if there is a way of making a single output build with two or more different variants, I don't think it will be graceful solution. I looked at some other MPI dependent packages and those all depend on mpi4py only. I want to keep the unidist's recipe clean and transperant so I think it makes sense to have unidist-mpi depending on mpi4py only and rely on a default MPI implementation that comes with mpi4py. If the user wants to install a specific MPI implementation, he will have to specify the exact implementation, e.g., conda install -c conda-forge unidist-mpi openmpi. The main reason I decided to change the build configuration for unidist-mpi is that when we installed unidist-mpi in our CI a few days ago, mpich came as the default MPI implementation for python 3.9 and openmpi came for python 3.10 and python 3.11 (Why? Doesn't mpi4py have a concrete default implementation?). That broke the CI because we relied on the parameters of mpich's mpiexec command at that time. So I tend to revert my changes and make unidist-mpi depending on mpi4py only. Anyway, thanks a lot to you in helping me on this matter. I want to introduce something like this to the conda-forge documentation (please let me know what you think on this).

** Building packages depending on MPI libraries **

When building a conda-forge package that depends on a MPI library, you should be carefull in its packaging.
As a rule, already existing packages only rely on mpi4py package that comes with a default MPI implementation.
However, if you want your package to depend on a specific MPI library,
you should build the package properly in order to not jeopardize the option of using a system-provided MPI implementation.
It may happen an MPI library that comes from conda-forge will override the system-provided MPI implementation.
dalcinl commented 9 months ago

mpich came as the default MPI implementation for python 3.9 and openmpi came for python 3.10 and python 3.11 (Why? Doesn't mpi4py have a concrete default implementation?).

No, mpi4py does not have a concrete default implementation, simply because I don't know of any way to set such thing up within the conda model. Years ago, when crafted the initial recipes to provide mpich/openmpi/mpi4py for conda, mpich was usually the implementation picked first, probably because its name sorts first. But the behavior you mention is probably related to some change in the dependency solver.

@ocefpaf @isuruf @minrk Sorry for pinging you guys yet again, but I really need some input. Do you guys have any idea how MPI depsolving works? Is there any way I could write down in the mpi4py recipe that I prefer to use one MPI implementation as default if the user does not explicitly requests for the other? It is a bit weird that an environment would get different MPI implementations installed depending on the Python version you request at the environment creation step. What's going on here?

dalcinl commented 9 months ago

@YarShev about these statements:

As a rule, already existing packages only rely on mpi4py package that comes with a default MPI implementation.

Such statement is not accurate. There is no default MPI implementation that comes with mpi4py. The mpi4py package is built with two variants: mpich and openmpi. If you execute a plain conda install mpi4py, which concrete MPI is picked is probably up to the dependency solver. Of course you can be explicit (add either mpich or openmpi to your package list) and the ambiguity is gone. Nonetheless, the fact that you get different MPIs depending on the version of an unrelated package like Python is quite disturbing. Let see if we can somehow fix this.

At this point I'm not sure what the documentation should say to not make things even more confusing! But if your statements, you should a) not say that mpi4py comes with a default MPI, and b) say instead the users should explicitly ask for mpich or openmpi.

YarShev commented 9 months ago

At this point I'm not sure what the documentation should say to not make things even more confusing! But if your statements, you should a) not say that mpi4py comes with a default MPI, and b) say instead the users should explicitly ask for mpich or openmpi.

Yes, let's see if we can somehow fix this.

dalcinl commented 9 months ago

I'm a bit confused. You will somehow revert this PR, right? I understood you will not depend explicitly on mpich, and rather let users decide.

YarShev commented 9 months ago

Yes, I will revert the changes and allow users decide on a concrete MPI implementation. It would be great if you could add a change to the mpi4py feedstock to have a default MPI implementation. Otherwise, I will have to add some changes to our CI to use the specific parameters of mpiexec in depend on a concrete MPI implementation.

dalcinl commented 9 months ago

Yes, I will revert the changes and allow users decide on a concrete MPI implementation.

OK.

It would be great if you could add a change to the mpi4py feedstock to have a default MPI implementation.

Remember that I don't know if that is even possible.

Otherwise, I will have to add some changes to our CI to use the specific parameters of mpiexec in depend on a concrete MPI implementation.

If you want to fix the MPI implementation to MPICH, just add mpich to the list of packages, either in an environment.yml file if you are using that, or in the conda/mamba/micromamba create -n <envname> mpi4py mpich ... command line.

YarShev commented 9 months ago

Remember that I don't know if that is even possible.

I understand, let's see.

If you want to fix the MPI implementation to MPICH, just add mpich to the list of packages, either in an environment.yml file if you are using that, or in the conda/mamba/micromamba create -n mpi4py mpich ... command line.

Yes, this sounds reasonable. However, we test unidist-mpi on linux and windows so we have to have two env files: one for linux and other for windows then. If we changed mpiexec's parameters in depend on an MPI implementation, I think that would allow as to not introduce duplication with two env files.

minrk commented 9 months ago

There are a few ways to express a preference for implementation from within a package. The main one used is the build number offset, where the solver picks the higher build if it's satisfiable, e.g.

{% if mpi == "mpich" %}
{%   set build = build + 100 %}
{% endif %}

while the lower build number will be picked if that was excluded somehow, e.g. by explicitly requesting openmpi.

There's another implementation involving 'features' that is quite hairy, and doesn't seem to do what folks want a lot of the time.

Note that expressing a preference in one package doesn't guarantee that the preference will be satisfied, because another preference could override it (e.g. if picking openmpi allows installing a later version of another dependency because of some version pinning some levels down). When there are two candidates, the only way to reliably get one candidate is to ask for it explicitly. Any other mechanism only makes it more likely, but cannot guarantee it.

From a package perspective, I'd recommend avoiding expressing a preference in this package, if possible, unless there's a strong reason to from the perspective of the package itself, and instead selecting the implementation at install time, since that's where the choice is really relevant. But the build number preference is the typical way to go if you really want to do that.

I don't think mpi4py should have a 'default' mpi, but conda-forge could by expressing a build number bias in the mpi metapackage.

we test unidist-mpi on linux and windows so we have to have two env files

I think having two env files is quite reasonable for two platforms, but the absence of environment markers is frustrating. You can apply platform filters if you use conda-lock, which takes a single for-humans environment specification and produces lockfiles for one or more platforms, a la npm/pipenv/poetry/etc.

YarShev commented 9 months ago

@minrk, thank you for the input. Looks like the way to go for unidist-mpi and mpi4py is not to have the default preference for a concrete MPI implementation and ask users to specify a concrete preference at install time if they do not want to rely on a default one.

I think having two env files is quite reasonable for two platforms, but the absence of environment markers is frustrating. You can apply platform filters if you use conda-lock, which takes a single for-humans environment specification and produces lockfiles for one or more platforms, a la npm/pipenv/poetry/etc.

It would be great to have an ability to use selectors in env.yaml files but this seems as non supported yet - https://github.com/conda/conda/issues/8089. So we will probably proceed with two env files for each platform.

dalcinl commented 9 months ago

I don't think mpi4py should have a 'default' mpi, but conda-forge could by expressing a build number bias in the mpi metapackage.

OK, understood. This is a decision conda-forge should take collectively, beyond any individual preference. I do have a few reasons in favor of picking MPICH as the default, but I understand others could object to that and list other valid reasons in favor of Open MPI.

For the time being, we keep doing things as usual, that is, requesting the MPI provider explicitly.

$ python -m this | grep ambiguity
In the face of ambiguity, refuse the temptation to guess.
isuruf commented 9 months ago

Is there any way I could write down in the mpi4py recipe that I prefer to use one MPI implementation as default if the user does not explicitly requests for the other?

There is. You can add

build:
  track_features:
     - mpi_no_default  # [mpi != 'mpich']

to make implementations other than mpich lower priority. (i.e. make mpich higher priority)

dalcinl commented 9 months ago

@isuruf Sorry, I missed your comment. What's your gut feeling about going this way and make mpi4py prefer MPICH (or Open MPI, if the community prefers that) over other implementations?

isuruf commented 9 months ago

If there was a clear winner, I'd say yes. (Performance, open source vs proprietary, etc) However both are good options and IMO there's no clear winner.