Open jaimergp opened 5 years ago
cc @jayfurmanek
I saw the doxygen build going. Looks like it timed out (10mins no output). A couple things we could try there:
Also: There was no GPU support for ppc64le on CENTOS6. In fact, CENTOS6 predates ppc64le as an arch. The anvil images and conda toolchain use CENTOS7 (cos7) on ppc64le and aarch64.
Anaconda doesn't provide newer cudatoolkit versions for ppc64le, unfortunately, although IBM does.
I don't know if anyone has tried ocl-icd on ppc64le. I know NVIDIA doesn't provide OpenCL for ppc64le so it may not be worth doing much with ocl-icd unfortunately.
Thanks for the valuable feedback @jayfurmanek!
I saw the doxygen build going. Looks like it timed out (10mins no output).
We changed the provider to azure
for ppc64le
and, although it takes a couple of hours, it worked! Doxygen is not frequently updated, so I'd say it's ok to leave as is.
There was no GPU support for ppc64le on CENTOS6. In fact, CENTOS6 predates ppc64le as an arch. The anvil images and conda toolchain use CENTOS7 (cos7) on ppc64le and aarch64.
Didn't know that, nice! One less thing to worry about.
Anaconda doesn't provide newer cudatoolkit versions for ppc64le, unfortunately, although IBM does.
Is there any official way to use the IBM channels with conda-forge
?
I know NVIDIA doesn't provide OpenCL for ppc64le so it may not be worth doing much with ocl-icd unfortunately.
If that's the case (I didn't know that either) then you are right, then there is probably no point in trying until we have official CUDA builds in ppc64le
.
Thanks again!
The IBM channel is here: https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
There is a license that needs to be accepted at package install time with an environment variable. IBM_POWERAI_LICENSE_ACCEPT=yes
It currently has various levels of CUDA 10.1 for ppc64le and x86-64.
Hi, what is the state for the release of openmm for ppc64le? Here https://github.com/conda-forge/openmm-feedstock/pull/36#issuecomment-754489067 there seem to be still shortcomings.
In particular, @giadefa pointed out that there are now new Power9 supercomputers with powerful GPUs: https://www.hpc.cineca.it/hardware/marconi100
I recall master
is ready for PPC, but we need to cut a new release for that. See https://github.com/openmm/openmm/issues/2993
We would love to start running on ORNL GPU's soon, so this would be great to get finalized!
Also, forge does have up to date cudatoolkit
and ocl-icd
packages for ppc64le now too, so I don't see any other blockers.
Once https://github.com/openmm/openmm/issues/2993 is accepted for release, I'll work on the CF machinery to put the PPC builds out there!
@peastman: Can we prioritize a 7.5.1 bugfix release to enable the ppc64le openmm toolchain to start building?
The thing blocking 7.5.1 is finding someone with an ARM Mac who can test that. If we either drop the ARM Mac support, or clearly mark it as untested, we can move ahead with releasing 7.5.1.
We can leave the existing warnings for 7.5.1 on arm64
and remove them when we have tested it thoroughly (either in a new build or in a new version).
+1 for just keeping the warnings. We've had the minimal tests run, and you didn't want us to send you an ARM machine, while I'm still months away from being allowed to use one by MSK. Let's get it out there so people can give us feedback.
Ok!
OpenMM 7.5.1rc1 is out (https://anaconda.org/conda-forge/openmm/files?version=7.5.1rc2), but I don't see the packages for PowerPC. Are we still on track to support PowerPC in OpenMM 7.5.1?
@peastman @jaimergp: Wasn't 7.5.1 supposed to have everything we need for ppc64le support?
Yes, I thought it was building for it. @jaimergp do you know why it didn't?
Because we (I) haven't rolled out support for CUDA on PPC yet. I was half hoping somebody else would do it while we fixed its support in OpenMM, but that didn't happen, so I'll get to it.
It shouldn't delay the release of the other builds though; I can work on it in the meantime.
@jaimergp thanks for the update. Do you have an estimate when the PowerPC packages will be available?
We need three (cascading) pieces of infrastructure:
So I can't give an estimate, but at least you can see the progress here.
Thanks! No need to hold up anything else while we wait for it.
@jaimergp
I see that https://github.com/conda-forge/docker-images/pull/178 and https://github.com/conda-forge/nvcc-feedstock/pull/66 have been merged. What is the situation with the last step?
I am working on it. I'll submit a PR later!
@raimis see #55
PPC builds used to be made on CI and uploaded to conda-forge until 7.6.0 (and they worked great btw). This does not seem to be the case for 7.7.0 any more. Any chance to resume them?
PPC builds no longer work when built with the compilers used by conda-forge. A lot of the test cases fail or segfault. They work fine when built using the standard system compilers. I've tried to track down the problem but without success. I believe it's caused by a compiler bug. Unfortunately, this means distributing PPC builds through conda-forge is now impossible
Oh no. Is there a "single place" for the local build instructions? (I used to have an attempt at https://github.com/giorginolab/miniomm/wiki/%5BOBSOLETE%5D-Compiling-OpenMM-on-M100 , but not sure how much they can be trusted).
Instructions on building from source are at http://docs.openmm.org/latest/userguide/library/02_compiling.html. We haven't done a survey of compilers to figure out which specific ones work and which fail. My general impression has been that gcc
is buggier than clang
, but that's based on only a few incidents. Once you build, be sure to do a make test
. Using the conda-forge compilers with PPC, we get a bunch of test failures like these:
1/9 Test #45: TestCpuCheckpoints ...............***Failed 0.24 sec
exception: Particle coordinate is NaN. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan
Start 48: TestCpuCustomManyParticleForce
2/9 Test #47: TestCpuCustomGBForce .............***Exception: SegFault 2.25 sec
Start 49: TestCpuCustomNonbondedForce
3/9 Test #49: TestCpuCustomNonbondedForce ......***Failed 0.20 sec
exception: Assertion failure at TestCustomNonbondedForce.h:103. Expected [4500, 0, 0], found [0, 0, 0]
By chance, is this a problem that only appears in CI? From what I understand conda-forge runs PPC64LE through emulation by default, which in my impression is buggy especially for numerics. A native (local) conda-build with conda-forge gcc 12.1.0-16 seems to work. (But there are other quirks, like CMake not finding CUDA)
I don't know. I don't have access to an actual PPC Linux system, so the only way I'm able to test it is through emulation. I can say, though, that it has all the hallmarks of a compiler bug. For example, I store some values into memory, load that memory into a SIMD register, and the register ends up with the wrong values. But if I print out the memory locations I just stored to before loading them into the register, then it ends up with the right values. That's the sort of behavior you tend to see if there's a bug in the compiler's optimization stage. This also isn't the first time I've run into a bug in gcc on PPC.
ppc64le
are technically possible, but in reality there are some barriers. I will collect relevant issues and PRs here.All
doxygen
has noppc64le
build yet. I am submitting a PR here.openmm
.CUDA
conda-forge
assumes a one-to-one relationship between CUDA versions and Docker images because it only considersx64
architectures. This should be addressed inconda-smithy
andnvcc
. Tracking issue.ppc64le
. PR here.defaults
only providescudatoolkit
v9.0 forppc64le
. There are no plans to change that indefaults
, butconda-forge
might get their own permissions.OpenCL
ocl-icd
could be used to trigger the compilation of the OpenCL parts.We could get an OpenCL + CPU build with relatively low effort if we fix
doxygen
andocl-icd
. Would this be enough?