conda-forge / openmm-feedstock

A conda-smithy repository for openmm.
BSD 3-Clause "New" or "Revised" License
7 stars 16 forks source link

PowerPC builds #8

Open jaimergp opened 5 years ago

jaimergp commented 5 years ago

ppc64le are technically possible, but in reality there are some barriers. I will collect relevant issues and PRs here.

All

CUDA

OpenCL

We could get an OpenCL + CPU build with relatively low effort if we fix doxygen and ocl-icd. Would this be enough?

jakirkham commented 5 years ago

cc @jayfurmanek

jayfurmanek commented 5 years ago

I saw the doxygen build going. Looks like it timed out (10mins no output). A couple things we could try there:

Also: There was no GPU support for ppc64le on CENTOS6. In fact, CENTOS6 predates ppc64le as an arch. The anvil images and conda toolchain use CENTOS7 (cos7) on ppc64le and aarch64.

Anaconda doesn't provide newer cudatoolkit versions for ppc64le, unfortunately, although IBM does.

I don't know if anyone has tried ocl-icd on ppc64le. I know NVIDIA doesn't provide OpenCL for ppc64le so it may not be worth doing much with ocl-icd unfortunately.

jaimergp commented 5 years ago

Thanks for the valuable feedback @jayfurmanek!

I saw the doxygen build going. Looks like it timed out (10mins no output).

We changed the provider to azure for ppc64le and, although it takes a couple of hours, it worked! Doxygen is not frequently updated, so I'd say it's ok to leave as is.

There was no GPU support for ppc64le on CENTOS6. In fact, CENTOS6 predates ppc64le as an arch. The anvil images and conda toolchain use CENTOS7 (cos7) on ppc64le and aarch64.

Didn't know that, nice! One less thing to worry about.

Anaconda doesn't provide newer cudatoolkit versions for ppc64le, unfortunately, although IBM does.

Is there any official way to use the IBM channels with conda-forge?

I know NVIDIA doesn't provide OpenCL for ppc64le so it may not be worth doing much with ocl-icd unfortunately.

If that's the case (I didn't know that either) then you are right, then there is probably no point in trying until we have official CUDA builds in ppc64le.

Thanks again!

jayfurmanek commented 5 years ago

The IBM channel is here: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

There is a license that needs to be accepted at package install time with an environment variable. IBM_POWERAI_LICENSE_ACCEPT=yes

It currently has various levels of CUDA 10.1 for ppc64le and x86-64.

giadefa commented 3 years ago

Hi, what is the state for the release of openmm for ppc64le? Here https://github.com/conda-forge/openmm-feedstock/pull/36#issuecomment-754489067 there seem to be still shortcomings.

jchodera commented 3 years ago

In particular, @giadefa pointed out that there are now new Power9 supercomputers with powerful GPUs: https://www.hpc.cineca.it/hardware/marconi100

jaimergp commented 3 years ago

I recall master is ready for PPC, but we need to cut a new release for that. See https://github.com/openmm/openmm/issues/2993

mrshirts commented 3 years ago

We would love to start running on ORNL GPU's soon, so this would be great to get finalized!

jayfurmanek commented 3 years ago

Also, forge does have up to date cudatoolkit and ocl-icd packages for ppc64le now too, so I don't see any other blockers.

jaimergp commented 3 years ago

Once https://github.com/openmm/openmm/issues/2993 is accepted for release, I'll work on the CF machinery to put the PPC builds out there!

jchodera commented 3 years ago

@peastman: Can we prioritize a 7.5.1 bugfix release to enable the ppc64le openmm toolchain to start building?

peastman commented 3 years ago

The thing blocking 7.5.1 is finding someone with an ARM Mac who can test that. If we either drop the ARM Mac support, or clearly mark it as untested, we can move ahead with releasing 7.5.1.

jaimergp commented 3 years ago

We can leave the existing warnings for 7.5.1 on arm64 and remove them when we have tested it thoroughly (either in a new build or in a new version).

jchodera commented 3 years ago

+1 for just keeping the warnings. We've had the minimal tests run, and you didn't want us to send you an ARM machine, while I'm still months away from being allowed to use one by MSK. Let's get it out there so people can give us feedback.

peastman commented 3 years ago

Ok!

raimis commented 3 years ago

OpenMM 7.5.1rc1 is out (https://anaconda.org/conda-forge/openmm/files?version=7.5.1rc2), but I don't see the packages for PowerPC. Are we still on track to support PowerPC in OpenMM 7.5.1?

jchodera commented 3 years ago

@peastman @jaimergp: Wasn't 7.5.1 supposed to have everything we need for ppc64le support?

peastman commented 3 years ago

Yes, I thought it was building for it. @jaimergp do you know why it didn't?

jaimergp commented 3 years ago

Because we (I) haven't rolled out support for CUDA on PPC yet. I was half hoping somebody else would do it while we fixed its support in OpenMM, but that didn't happen, so I'll get to it.

It shouldn't delay the release of the other builds though; I can work on it in the meantime.

raimis commented 3 years ago

@jaimergp thanks for the update. Do you have an estimate when the PowerPC packages will be available?

jaimergp commented 3 years ago

We need three (cascading) pieces of infrastructure:

So I can't give an estimate, but at least you can see the progress here.

peastman commented 3 years ago

Thanks! No need to hold up anything else while we wait for it.

raimis commented 3 years ago

@jaimergp

I see that https://github.com/conda-forge/docker-images/pull/178 and https://github.com/conda-forge/nvcc-feedstock/pull/66 have been merged. What is the situation with the last step?

jaimergp commented 3 years ago

I am working on it. I'll submit a PR later!

jaimergp commented 3 years ago

@raimis see #55

tonigi commented 2 years ago

PPC builds used to be made on CI and uploaded to conda-forge until 7.6.0 (and they worked great btw). This does not seem to be the case for 7.7.0 any more. Any chance to resume them?

peastman commented 2 years ago

PPC builds no longer work when built with the compilers used by conda-forge. A lot of the test cases fail or segfault. They work fine when built using the standard system compilers. I've tried to track down the problem but without success. I believe it's caused by a compiler bug. Unfortunately, this means distributing PPC builds through conda-forge is now impossible

tonigi commented 2 years ago

Oh no. Is there a "single place" for the local build instructions? (I used to have an attempt at https://github.com/giorginolab/miniomm/wiki/%5BOBSOLETE%5D-Compiling-OpenMM-on-M100 , but not sure how much they can be trusted).

peastman commented 2 years ago

Instructions on building from source are at http://docs.openmm.org/latest/userguide/library/02_compiling.html. We haven't done a survey of compilers to figure out which specific ones work and which fail. My general impression has been that gcc is buggier than clang, but that's based on only a few incidents. Once you build, be sure to do a make test. Using the conda-forge compilers with PPC, we get a bunch of test failures like these:

  1/9 Test #45: TestCpuCheckpoints ...............***Failed    0.24 sec
  exception: Particle coordinate is NaN.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

      Start 48: TestCpuCustomManyParticleForce
  2/9 Test #47: TestCpuCustomGBForce .............***Exception: SegFault  2.25 sec

      Start 49: TestCpuCustomNonbondedForce
  3/9 Test #49: TestCpuCustomNonbondedForce ......***Failed    0.20 sec
  exception: Assertion failure at TestCustomNonbondedForce.h:103.   Expected [4500, 0, 0], found [0, 0, 0]
tonigi commented 2 years ago

By chance, is this a problem that only appears in CI? From what I understand conda-forge runs PPC64LE through emulation by default, which in my impression is buggy especially for numerics. A native (local) conda-build with conda-forge gcc 12.1.0-16 seems to work. (But there are other quirks, like CMake not finding CUDA)

peastman commented 2 years ago

I don't know. I don't have access to an actual PPC Linux system, so the only way I'm able to test it is through emulation. I can say, though, that it has all the hallmarks of a compiler bug. For example, I store some values into memory, load that memory into a SIMD register, and the register ends up with the wrong values. But if I print out the memory locations I just stored to before loading them into the register, then it ends up with the right values. That's the sort of behavior you tend to see if there's a bug in the compiler's optimization stage. This also isn't the first time I've run into a bug in gcc on PPC.