conda-forge / openmm-feedstock

A conda-smithy repository for openmm.
BSD 3-Clause "New" or "Revised" License
7 stars 16 forks source link

libOpenMM.so: undefined reference to `memcpy@GLIBC_2.14' #48

Open lohedges opened 3 years ago

lohedges commented 3 years ago

Thanks for the effort in moving OpenMM to conda-forge. This enables us to move SIre to the conda-forge ecosystem, which will hopefully improve the stability of our build process and make maintenance easier.

I have created a recipe for Sire that lists openmm as a host requirement and links against libOpeMM.so when building the SireMove library. After updating our CMakeListst.txt to use the new libstdc++ ABI (the Omnia package used the old ABI) I see the following error while linking:

libOpenMM.so: undefined reference to `memcpy@GLIBC_2.14'

Examining the libOpenMM.so from the latest conda package (using Python 3.7 as a reference) I find:

_openmm-7.5.0-py37h01de88b_6_:

readelf -a lib/libOpenMM.so | grep memcpy
00000037c228  004500000006 R_X86_64_GLOB_DAT 0000000000000000 memcpy@GLIBC_2.14 + 0
  4400: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memcpy@@GLIBC_2.14

Looking at my Linux system, which uses glibc 2.33-4, I find:

nm  -gD /usr/lib/libc.so.6 | grep memcpy
0000000000090800 i memcpy@@GLIBC_2.14
00000000000a8ce0 T memcpy@GLIBC_2.2.5
000000000010c620 i __memcpy_chk@@GLIBC_2.3.4
00000000000a9aa0 W wmemcpy@@GLIBC_2.2.5
000000000010d9c0 T __wmemcpy_chk@@GLIBC_2.4

Looking at Python 3.7 conda-forge builds of OpenMM equal to or earlier than the following I find:

_openmm-7.5.0-py37h18a0e3e_6_:

readelf -a lib/libOpenMM.so | grep memcpy
00000037f1f0  009600000006 R_X86_64_GLOB_DAT 0000000000000000 memcpy@GLIBC_2.2.5 + 0
  7600: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND memcpy@@GLIBC_2.2.5

(This is the same as is used for the Omina package.)

Pinning to this version of the package in our recipe allows the build to finish successfully.

I assume this issue is possibly related to the move to CentOS7 following the CentOS 6 EOL that was recently announced and the corresponding change in the glibc ABI. Is this the expected behaviour? If so, how should I update our build to handle the change? (I assume that I wouldn't experience this issue when running a real build on the conda-forge servers, since the build environment will be the same.)

Cheers.

jaimergp commented 3 years ago

Can you link to the PR at staged-recipes? I can't find it. If it's not there, there's a chance there's a missing pin that might appear when you run through the CF CI pipeline!

jaimergp commented 3 years ago

We have two packages that link to openmm in case you need an example:

jaimergp commented 3 years ago

Also, py37h18a0e3e_6 is compiled for CUDA 9.2, while py37h01de88b_6 is compiled for CUDA 11. CUDA 11 requires CentOS 7, but CUDA <= 10.2 is still built with CentOS 6, so that's possibly your ABI issue!

jaimergp commented 3 years ago

Btw, if you want, you can tag me in the staged-recipes PR and I will add some comments so it works as expected!

lohedges commented 3 years ago

Thanks for the quick reply. I haven't created a pull request yet. I was just testing the build locally in order to make sure everything worked before doing so. (Our build is quite large and slow.)

Yes, it does look like an ABI issue, although I'm confused that I have a system glibc newer than the one specified in the run requirements section in the meta.yaml for the package that fails, i.e. >=2.17.

I'll create a PR for the my stage-recipes fork. Thanks for the offer to tag you in. It would be great to get your help ironing out any build issues.

jaimergp commented 3 years ago

Bringing in cudatoolkit==11 will bring sysroot==2.17 too, hence the different glibc. We'll see on the PR more clearly.

lohedges commented 3 years ago

I've now submitted the PR (linked above). Thanks for your help.

jakirkham commented 3 years ago

Does this come up during testing or the build?

lohedges commented 3 years ago

@jakirkham: This error is thrown during a build of Sire against the conda-forge package of OpenMM in a clean Python 3.7 environment on my up-to-date Linux box running glibc 2.33-4. The build is fine when using the Omnia channel package (which has worked for us for years) and also when building against earlier conda-forge builds of OpenMM.

jakirkham commented 3 years ago

Maybe temporarily constrain cudatoolkit to 10.2 in host (assuming this project doesn't also have CUDA bits that need to be built)? This can be relaxed after merging in the feedstock. Basically it sounds like staged-recipes doesn't have a good way of selecting different OS versions, which is causing the issue here and shouldn't be a blocker for a recipe submission. The suggestion included here should work around that. Though if someone comes along with a better solution, would be interested to see that as well

isuruf commented 3 years ago

Basically it sounds like staged-recipes doesn't have a good way of selecting different OS versions

This is not true.

jakirkham commented 3 years ago

Ok how does one configure this then?

isuruf commented 3 years ago

If you look at the staged-recipes PR above, you'll see that it works just fine.

lohedges commented 3 years ago

Thanks for your comments. I think you have misunderstood my question. As you say, in the staged-recipes PR the build works just fine, which is what I would expect because the build platform and version of glibc are the same. However, if we try to use the conda-forge OpenMM package on a different Linux system then, unlike the Omnia package, we can no longer build against it because of the memcpy symbol issue mentioned in the original post.

We have built against OpenMM in this way for a long time (we do this for development purposes and for creating our own self-contained binary install of Sire) and haven't had issues using libOpenMM from the Omnia channel, i.e. it works on all systems with a version of glibc equal to or newer than what it was built against. It's quite possible that we can solve this issue by adjusting our build configuration, but I just wanted to point it out in case the change in the memcpy symbol required by libOpenMM was undesired. The OpenMM docs do tell you to use the conda-forge package, so I would expect people to be building against its libOpenMM locally.

Cheers.

isuruf commented 3 years ago

There are several options,

  1. Use our docker image
  2. export CONDA_OVERRIDE_GLIBC=2.12
  3. Add cudatoolkit 10.2 to host
  4. Add sysroot_linux-64 2.17 as a build requirement

1,2,3 will get you a package that is compatible with glibc>=2.12 4 will get you a package that is compatible with glibc>=2.17

lohedges commented 3 years ago

Many thanks. 2, 3, and 4 work, and are easy enough to implement in our build. I also tried running the docker image using the instructions here but it seems that it doesn't work when recipes are on the master branch, which they are in my fork:

+ echo 'Finding recipes merged in master and removing them from the build.'
Finding recipes merged in master and removing them from the build.
+ pushd /home/conda/staged-recipes/recipes
+ '[' False == True ']'
+ git ls-tree --name-only master -- .
+ xargs -I '{}' sh -c 'rm -rf ~/staged-recipes-copy/recipes/{} && echo Removing recipe: {}'
Removing recipe: example
Removing recipe: sire
...
Found no recipes to build

The part that I still don't understand (apologies if this is obvious) is why, if approach 4 gives a package that is compatible with glibc>=2.17, does the libOpenMM.so generated in this manner not work with our build using glibc=2.33-4 on my system?

jlmaccal commented 3 years ago

I'm running into a similar issue. I'm planning to move our MELD package, which contains an OpenMM plugin, over to conda-forge.

As a first step, I'm in the process of migrating our CI from travis to github actions. I'm using OpenMM's CI as a template. As part of this process, I install cuda using apt, and install the corresponding cudatoolkit and openmm from conda-forge.

Everything works fine when I use the system compilers, but when I try to use the conda-forge compilers I get the following errors.

For cuda=10.0:

...

[100%] Linking CXX executable TestCudaMeldForce
/usr/share/miniconda/envs/build/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: warning: libcuda.so.1, needed by /usr/share/miniconda/envs/build/lib/libOpenMMCUDA.so, not found (try using -rpath or -rpath-link)
/usr/share/miniconda/envs/build/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/share/miniconda/envs/build/lib/libOpenMMCUDA.so: undefined reference to `cuCtxSetLimit'

...

For cuda=11.2:

...

[ 71%] Linking CXX executable TestSerializeMeldForce
/usr/share/miniconda/envs/build/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: /usr/share/miniconda/envs/build/lib/libOpenMM.so: undefined reference to `memcpy@GLIBC_2.14'
collect2: error: ld returned 1 exit status

...

This is the same error reported above.

I realize that this problem might not exist when building using the conda-forge infrastructure, but I wanted to have our CI test against the same compiler versions. Any ideas on how to move forward? Is this going to cause issues if our users are building from source?

jchodera commented 3 years ago

@jlmaccal: I don't believe you should install CUDA with apt---everything you need in the CUDA toolkit should come in the cudatoolkit page, and I worry there may be inconsistencies between the apt-installed libs and the conda-forge libs.

How does the process fail if you omit the apt installation of CUDA?

jchodera commented 3 years ago

@jlmaccal: You might consider using the openmm-torch as a better template for getting GitHub actions up and running, since it's closer to how MELD will build its OpenMM plugins: https://github.com/openmm/openmm-torch

jlmaccal commented 3 years ago

It won't build the cuda platform part of meld without installing from apt. Both the openmm-torch and openmm-plumed CI scripts also install from source.

jaimergp commented 3 years ago

Yep, the build-time dependencies of the CUDA bits are not EULA-distributable and hence not contained in cudatoolkit (that's only the _run_time). You need something to provide nvcc and friends, be it the APT packages or a Docker image. The CI uses this script to install those with APT.

jaimergp commented 3 years ago

@jlmaccal I think you managed to get it working, right?

@lohedges, did you fix your issues?

The part that I still don't understand (apologies if this is obvious) is why, if approach 4 gives a package that is compatible with glibc>=2.17, does the libOpenMM.so generated in this manner not work with our build using glibc=2.33-4 on my system?

Your conda env might contain an old package linked against 2.12 which is causing the issue.

lohedges commented 3 years ago

No, I still experience the same issue with a local build. I'm building in a completely clean environment which is fully up-to-date. It's simply a fresh Miniconda with conda-build installed on top of the base packages. At present I'm just adding sysroot_linux-64 2.17 as an additional build requirement.

jaimergp commented 3 years ago

Is this the most recent recipe you are using?

lohedges commented 3 years ago

Yes, that's right. It works fine in the pull request (other than timing out) and works locally with the addition of sysroot_linux-64 2.17.

jlmaccal commented 3 years ago

@jaimergp No, I just temporarily moved on.

I was trying to replicate what openmm itself does, which is to run CI builds against the same compiler toolchain as conda-forge uses, but run into the errors above.

lohedges commented 3 years ago

@jaimergp I just realise that I had since updated the Sire 2021.1.0 release, so the checksum had changed. I've pushed an update in case you had tried building this yourself.

jaimergp commented 3 years ago

@lohedges

To build locally with the conda forge setup, you need to add some changes to your fork structure.

  1. Clone https://github.com/michellab/staged-recipes
  2. git checkout -b add-sire
  3. git checkout master
  4. rm -rf recipes/sire recipes/libcpuid
  5. git commit -am "remove recipes from master"
  6. git checkout add-sire

Now that you are on a branch instead of master, the scripts will run successfully:

mkdir -p build_artifacts/linux-64/

export CONFIG=linux64
export IMAGE_NAME=quay.io/condaforge/linux-anvil-comp7
.scripts/run_docker_build.sh

This will run Docker (same image as in Azure) and build the packages. The artifacts (either the final tarball or the working directory if failed) will appear under build_artifacts. Hopefully this enables you to iterate faster!

I am running it too to see if I can find something obvious, but I am running a bit short on time lately and won't be able to spend too much time here, sorry!

lohedges commented 3 years ago

Thanks for the info, I'll try this when I next get a chance.