Open lohedges opened 3 years ago
Can you link to the PR at staged-recipes
? I can't find it. If it's not there, there's a chance there's a missing pin that might appear when you run through the CF CI pipeline!
We have two packages that link to openmm
in case you need an example:
Also, py37h18a0e3e_6
is compiled for CUDA 9.2, while py37h01de88b_6
is compiled for CUDA 11. CUDA 11 requires CentOS 7, but CUDA <= 10.2 is still built with CentOS 6, so that's possibly your ABI issue!
Btw, if you want, you can tag me in the staged-recipes PR and I will add some comments so it works as expected!
Thanks for the quick reply. I haven't created a pull request yet. I was just testing the build locally in order to make sure everything worked before doing so. (Our build is quite large and slow.)
Yes, it does look like an ABI issue, although I'm confused that I have a system glibc
newer than the one specified in the run requirements section in the meta.yaml
for the package that fails, i.e. >=2.17
.
I'll create a PR for the my stage-recipes fork. Thanks for the offer to tag you in. It would be great to get your help ironing out any build issues.
Bringing in cudatoolkit==11
will bring sysroot==2.17
too, hence the different glibc. We'll see on the PR more clearly.
I've now submitted the PR (linked above). Thanks for your help.
Does this come up during testing or the build?
@jakirkham: This error is thrown during a build of Sire against the conda-forge package of OpenMM in a clean Python 3.7 environment on my up-to-date Linux box running glibc 2.33-4. The build is fine when using the Omnia channel package (which has worked for us for years) and also when building against earlier conda-forge builds of OpenMM.
Maybe temporarily constrain cudatoolkit
to 10.2
in host
(assuming this project doesn't also have CUDA bits that need to be built)? This can be relaxed after merging in the feedstock. Basically it sounds like staged-recipes
doesn't have a good way of selecting different OS versions, which is causing the issue here and shouldn't be a blocker for a recipe submission. The suggestion included here should work around that. Though if someone comes along with a better solution, would be interested to see that as well
Basically it sounds like staged-recipes doesn't have a good way of selecting different OS versions
This is not true.
Ok how does one configure this then?
If you look at the staged-recipes PR above, you'll see that it works just fine.
Thanks for your comments. I think you have misunderstood my question. As you say, in the staged-recipes PR the build works just fine, which is what I would expect because the build platform and version of glibc are the same. However, if we try to use the conda-forge OpenMM package on a different Linux system then, unlike the Omnia package, we can no longer build against it because of the memcpy symbol issue mentioned in the original post.
We have built against OpenMM in this way for a long time (we do this for development purposes and for creating our own self-contained binary install of Sire) and haven't had issues using libOpenMM from the Omnia channel, i.e. it works on all systems with a version of glibc equal to or newer than what it was built against. It's quite possible that we can solve this issue by adjusting our build configuration, but I just wanted to point it out in case the change in the memcpy symbol required by libOpenMM was undesired. The OpenMM docs do tell you to use the conda-forge package, so I would expect people to be building against its libOpenMM locally.
Cheers.
There are several options,
export CONDA_OVERRIDE_GLIBC=2.12
cudatoolkit 10.2
to host
sysroot_linux-64 2.17
as a build
requirement1,2,3 will get you a package that is compatible with glibc>=2.12 4 will get you a package that is compatible with glibc>=2.17
Many thanks. 2, 3, and 4 work, and are easy enough to implement in our build. I also tried running the docker image using the instructions here but it seems that it doesn't work when recipes are on the master branch, which they are in my fork:
+ echo 'Finding recipes merged in master and removing them from the build.'
Finding recipes merged in master and removing them from the build.
+ pushd /home/conda/staged-recipes/recipes
+ '[' False == True ']'
+ git ls-tree --name-only master -- .
+ xargs -I '{}' sh -c 'rm -rf ~/staged-recipes-copy/recipes/{} && echo Removing recipe: {}'
Removing recipe: example
Removing recipe: sire
...
Found no recipes to build
The part that I still don't understand (apologies if this is obvious) is why, if approach 4 gives a package that is compatible with glibc>=2.17, does the libOpenMM.so generated in this manner not work with our build using glibc=2.33-4 on my system?
I'm running into a similar issue. I'm planning to move our MELD package, which contains an OpenMM plugin, over to conda-forge.
As a first step, I'm in the process of migrating our CI from travis to github actions. I'm using OpenMM's CI as a template. As part of this process, I install cuda using apt
, and install the corresponding cudatoolkit
and openmm
from conda-forge.
Everything works fine when I use the system compilers, but when I try to use the conda-forge compilers I get the following errors.
For cuda=10.0:
...
[100%] Linking CXX executable TestCudaMeldForce
/usr/share/miniconda/envs/build/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libcuda.so.1, needed by /usr/share/miniconda/envs/build/lib/libOpenMMCUDA.so, not found (try using -rpath or -rpath-link)
/usr/share/miniconda/envs/build/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/share/miniconda/envs/build/lib/libOpenMMCUDA.so: undefined reference to `cuCtxSetLimit'
...
For cuda=11.2:
...
[ 71%] Linking CXX executable TestSerializeMeldForce
/usr/share/miniconda/envs/build/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/share/miniconda/envs/build/lib/libOpenMM.so: undefined reference to `memcpy@GLIBC_2.14'
collect2: error: ld returned 1 exit status
...
This is the same error reported above.
I realize that this problem might not exist when building using the conda-forge infrastructure, but I wanted to have our CI test against the same compiler versions. Any ideas on how to move forward? Is this going to cause issues if our users are building from source?
@jlmaccal: I don't believe you should install CUDA with apt
---everything you need in the CUDA toolkit should come in the cudatoolkit
page, and I worry there may be inconsistencies between the apt-installed libs and the conda-forge libs.
How does the process fail if you omit the apt
installation of CUDA?
@jlmaccal: You might consider using the openmm-torch
as a better template for getting GitHub actions up and running, since it's closer to how MELD will build its OpenMM plugins:
https://github.com/openmm/openmm-torch
It won't build the cuda platform part of meld without installing from apt. Both the openmm-torch and openmm-plumed CI scripts also install from source.
Yep, the build-time dependencies of the CUDA bits are not EULA-distributable and hence not contained in cudatoolkit
(that's only the _run_time). You need something to provide nvcc and friends, be it the APT packages or a Docker image. The CI uses this script to install those with APT.
@jlmaccal I think you managed to get it working, right?
@lohedges, did you fix your issues?
The part that I still don't understand (apologies if this is obvious) is why, if approach 4 gives a package that is compatible with glibc>=2.17, does the libOpenMM.so generated in this manner not work with our build using glibc=2.33-4 on my system?
Your conda env might contain an old package linked against 2.12 which is causing the issue.
No, I still experience the same issue with a local build. I'm building in a completely clean environment which is fully up-to-date. It's simply a fresh Miniconda with conda-build
installed on top of the base packages. At present I'm just adding sysroot_linux-64 2.17
as an additional build requirement.
Yes, that's right. It works fine in the pull request (other than timing out) and works locally with the addition of sysroot_linux-64 2.17
.
@jaimergp No, I just temporarily moved on.
I was trying to replicate what openmm itself does, which is to run CI builds against the same compiler toolchain as conda-forge uses, but run into the errors above.
@jaimergp I just realise that I had since updated the Sire 2021.1.0 release, so the checksum had changed. I've pushed an update in case you had tried building this yourself.
@lohedges
To build locally with the conda forge setup, you need to add some changes to your fork structure.
git checkout -b add-sire
git checkout master
rm -rf recipes/sire recipes/libcpuid
git commit -am "remove recipes from master"
git checkout add-sire
Now that you are on a branch instead of master, the scripts will run successfully:
mkdir -p build_artifacts/linux-64/
export CONFIG=linux64
export IMAGE_NAME=quay.io/condaforge/linux-anvil-comp7
.scripts/run_docker_build.sh
This will run Docker (same image as in Azure) and build the packages. The artifacts (either the final tarball or the working directory if failed) will appear under build_artifacts
. Hopefully this enables you to iterate faster!
I am running it too to see if I can find something obvious, but I am running a bit short on time lately and won't be able to spend too much time here, sorry!
Thanks for the info, I'll try this when I next get a chance.
Thanks for the effort in moving OpenMM to conda-forge. This enables us to move SIre to the conda-forge ecosystem, which will hopefully improve the stability of our build process and make maintenance easier.
I have created a recipe for Sire that lists
openmm
as a host requirement and links againstlibOpeMM.so
when building theSireMove
library. After updating ourCMakeListst.txt
to use the newlibstdc++
ABI (the Omnia package used the old ABI) I see the following error while linking:Examining the
libOpenMM.so
from the latest conda package (using Python 3.7 as a reference) I find:_openmm-7.5.0-py37h01de88b_6_:
Looking at my Linux system, which uses
glibc
2.33-4, I find:Looking at Python 3.7 conda-forge builds of OpenMM equal to or earlier than the following I find:
_openmm-7.5.0-py37h18a0e3e_6_:
(This is the same as is used for the Omina package.)
Pinning to this version of the package in our recipe allows the build to finish successfully.
I assume this issue is possibly related to the move to CentOS7 following the CentOS 6 EOL that was recently announced and the corresponding change in the
glibc
ABI. Is this the expected behaviour? If so, how should I update our build to handle the change? (I assume that I wouldn't experience this issue when running a real build on the conda-forge servers, since the build environment will be the same.)Cheers.