conda-forge / cfep

conda-forge's Enhancement Proposal
BSD 3-Clause "New" or "Revised" License
18 stars 24 forks source link

Policies for packages that use OpenCL #17

Open peastman opened 5 years ago

peastman commented 5 years ago

This issue is a followup to https://github.com/conda-forge/openmm-feedstock/pull/7#. We are trying to move OpenMM to be distributed through conda-forge, and some policy questions came up that need to be resolved. In particular, we have been told conda-forge has an established policy that any package that uses OpenCL is required to list ocl-icd as a dependency. We do not want to list it as a dependecy: given our particular needs, there doesn't exist any situation where it would benefit our users, and there do exist situations where it would cause problems.

Let me give some background information, since it will help to clarify the issues we're dealing with. OpenMM is a widely used package for molecular dynamics simulation. It can use OpenCL, but it doesn't require it. It comes with multiple computational backends: CUDA, OpenCL, and CPU. It automatically selects which one to use at runtime based on the available hardware and software (whether or CUDA and/or OpenCL is installed). The CUDA and OpenCL backends exist mainly for users with high end GPUs, which can be much faster than CPUs. For users who don't have one of those, it's generally better to use the CPU backend.

Accurate results are critical. This is scientific software, and errors could lead to users having to retract papers and losing months or years of work. We only support a small number of OpenCL implementations. On Linux, we support the ones from AMD and NVIDIA. Neither of those is conda installable. They are included with the video drivers. On macOS, we support Apple's implementation, which is built into the OS. These are the only ones we test. Using any other implementation would have a high risk of producing incorrect results. Our installation instructions tell users how to install appropriate OpenCL implementations. We are very concerned about anything that might lead to a user inadvertently using a different implementation they didn't even know was there.

OpenMM has APIs for C++, C, Fortran, and Python. So although we use conda as a distribution mechanism, it is part of a much larger ecosystem than just conda-forge, conda, or even Python. It needs to work correctly with all other libraries on the OS, regardless of how they are installed or what language they are used from.

So that's the background. Now let me describe our particular concerns.

  1. Given our needs, there simply is no benefit to installing ocl-icd. None of the OpenCL implementations we support is conda installable.

  2. Installing it creates new opportunities for the user to get incorrect results by silently using a different OpenCL implementation than the supported one we told them to install.

  3. ocl-icd loads ICD files from a nonstandard location. Whereas all standard ICD loaders on Linux look for them in /etc/OpenCL/vendors, ocl-icd looks for them in a location inside the conda installation. If users want to use any OpenCL that was installed system-wide (such as the ones from the GPU vendors), they have to add a symlink pointing to the standard location. So including ocl-icd makes life harder for users. (A workaround is to also install ocl-icd-system, which creates the symlink for them, but it appears few packages currently do this.)

  4. On macOS there is an additional, very serious problem: ocl-icd is written in a way which is incompatible with Apple's OpenCL framework. If two packages are loaded at the same time, one that links to ocl-icd and the other that links directly to Apple's framework (as nearly all OpenCL-based codes do on macOS), this will result in a crash. Linking to ocl-icd would therefore make OpenMM incompatible with nearly all other OpenCL based software that comes from any source other than conda-forge.

Given these issues, I suggest the existing policy should be reconsidered. Packages should not be required to depend on ocl-icd unless they have a specific reason to depend on it. Even better, I would suggest it should be a dependency for implementations of OpenCL, not for packages that use OpenCL. This is how the ICD mechanism was designed to work in the first place. ICD loaders are normally included with the implementation. They are never included with individual applications that just happen to use OpenCL. Tying the loader to the application rather than the implementation is contrary to the design of the mechanism.

This would make life easier for users of all OpenCL based packages. If they haven't specifically conda installed an OpenCL implementation, ocl-icd would not block the standard implementation that came with their GPU drivers from being found.

I also suggest that any use at all of ocl-icd on macOS should be carefully reconsidered. Until it can be made to coexist properly with the standard framework built into the OS, it just isn't appropriate for any package that is intended to be widely distributed.

cc: @jchodera @jaimergp

isuruf commented 4 years ago

@jaimergp, see the email in https://github.com/loriab

loriab commented 4 years ago

@jaimergp, I already have email addresses for the OpenMM folks.

jaimergp commented 4 years ago

Fine, thanks both of you!

isuruf commented 4 years ago

@jaimergp, for disabling using conda opencl implementations, here's what needs to be done,

  1. Create a variant of ocl-icd for linux that only looks at the system ones. 1a. Add a recipe/conda_build_config.yaml with,

     variant:
        - conda
        - system-only

    1b. Change the build string of the conda package to include the variant name. 1c. Add the following which makes system-only variant not default.

     build:
        track_features: true  # [variant == "system-only"]

    1d. Change https://github.com/conda-forge/ocl-icd-feedstock/blob/master/recipe/build.sh#L3 according to env variable variant 1e. Add an activation script to variant=system-only, so that if an OpenCL implementation in conda env is installed, a warning is printed.

  2. Do a similar thing for khronos-opencl-icd-loader on OSX

  3. Make openmm depend on system-only versions.

jaimergp commented 4 years ago

Thank you Isuru! This really helps!

peastman commented 4 years ago

Dear conda-forge team,

After our chat last January, we have been trying to come up with a proposal that will hopefully work for all of us without (a) forcing a compromise on conda-forge policies and (b) adding maintenance burden on the openmm developers. The proposal consists of the following key points:

  1. We will build all OpenMM packages on our own infrastructure.
  2. We will make sure we use compilers and ABIs consistent with conda-forge by using one or more feedstocks as rendered by conda-smithy.
  3. A subset of the packages will be directly copied over to conda-forge, while others will be distributed only through the openmm channel.

Some packages will include dynamic libraries that link to software from outside conda-forge. That could include libraries built into the OS, software from other packaging systems like apt, and proprietary software that must be obtained directly from the vendor. In all cases, these will be optional libraries that get loaded dynamically at runtime. If the needed software isn't present on a user's computer, that will just mean the corresponding plugin doesn't get loaded, and everything else in OpenMM continues to work normally.

To address the incompatibilities related on OpenCL on MacOS, the subset of packages that will be copied over to conda-forge will be built with two variants:

A) One that links to khronos-opencl-icd-loader. B) One that directly links to the implementation built into the OS.

Build strings will be labeled in a way that variant B has higher precedence, so if a user installs OpenMM and nothing else, it will get variant B. However, variant B will be marked as incompatible with the ICD loaders so if another package in the environment is linking to OpenCL through a ICD, the loader will be requested and the variant A will be pulled in.

jaimergp commented 4 years ago

This message was sent here by mistake! You can ignore this. Apologies!

peastman commented 4 years ago

Did you mean to post that on a different issue?

jaimergp commented 4 years ago

Wow, yes! Sorry!

jaimergp commented 4 years ago

Hi @conda-forge/core, do you have any comments regarding our last proposal a couple of messages above? Thank you!

beckermr commented 4 years ago

Allowing outside feedstocks to push packages to the conda-forge channels is going to get trickier due to CFEP-13.

isuruf commented 4 years ago

I still don't get why you can't move the build scripts to conda-forge.

scopatz commented 4 years ago

FWIW - the variants above should probably be implemented as mutexes, like with blas, rather than features as mentioned above

isuruf commented 4 years ago

What? blas uses the same mechanism.

scopatz commented 4 years ago

Yeah, you are right, sorry

scopatz commented 4 years ago

I thought we were moving away from features, though, and moving to mutexes more like what is done with MPI. Are features really needed here?

isuruf commented 4 years ago

You are talking about features and not track_features which are two related, but different things.

Yes, track_features is needed here to prioritize which is used by default.

scopatz commented 4 years ago

Ahh Ok thanks for the clarification @isuruf