FoldingAtHome / openmm

OpenMM is a toolkit for molecular simulation using high performance GPU code.
22 stars 4 forks source link

Debugging further NaNs #32

Open jchodera opened 4 years ago

jchodera commented 4 years ago

@peastman: @ThWuensche mentioned in the slack:

With the new version that issue is fixed, but unfortunately with 13424 I get NaNs further on in the runs, later than before though. I captured a few WUs and will run them on local openMM (FoldingAtHome/openmm and peastman/openmm) this evening I have tried 13424(2186,46,2) locally with current FoldingAtHome/openmm and it works. However through F@H it failed with particle coordinate nan at step 2007. Can you find something in the logs that would explain that difference? Actually the behaviour on F@H looks similar to what we had before both patches, but it's quite unlikely that the patches are missing.

It seems likely that we must still be missing some bugfixes that appear in 7.5.0 but not in this patched 7.4.2. I'll try a quick core22 test build from openmm/openmm master right now to see if we can verify this is the case.

jchodera commented 4 years ago

Hm, no luck with compiling OpenMM from openmm/openmm master by just changing which version we check out:

[ 89%] Building CXX object platforms/opencl/sharedTarget/CMakeFiles/OpenMMOpenCL.dir/__/src/OpenCLArray.cpp.o
In file included from /home/conda/openmm/platforms/opencl/src/OpenCLArray.cpp:27:
In file included from /home/conda/openmm/platforms/opencl/./include/OpenCLArray.h:35:
/home/conda/openmm/platforms/opencl/src/cl.hpp:155:9: warning: This version of the OpenCL Host API C++ bindings is deprecated, please use cl2.hpp instead. [-W#pragma-messages]
#pragma message("This version of the OpenCL Host API C++ bindings is deprecated, please use cl2.hpp instead.")
       ^
In file included from /home/conda/openmm/platforms/opencl/src/OpenCLArray.cpp:28:
In file included from /home/conda/openmm/platforms/opencl/./include/OpenCLContext.h:55:
In file included from /home/conda/openmm/platforms/opencl/./include/OpenCLExpressionUtilities.h:30:
In file included from /home/conda/openmm/platforms/opencl/../common/include/openmm/common/ExpressionUtilities.h:30:
In file included from /home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeContext.h:38:
In file included from /home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeProgram.h:30:
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:34: error: no member named 'is_trivially_copyable' in namespace 'std'
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
                            ~~~~~^
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:56: error: 'T' does not refer to a value
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
                                                       ^
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:62:21: note: declared here
    template <class T>
                    ^
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:60: error: member 'value' declared as a template
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
                                                           ^
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:67: error: expected member name or ';' after declaration specifiers
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
    ~~~~~~~~                                                      ^
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:67: error: member (null) declared as a template
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
                                                                  ^
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:66: error: expected ';' at end of declaration list
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
                                                                 ^
                                                                 ;
1 warning and 6 errors generated.
make[3]: *** [platforms/opencl/sharedTarget/CMakeFiles/OpenMMOpenCL.dir/__/src/OpenCLArray.cpp.o] Error 1

Any idea if more changes would be required to the CMAKE_FLAGS to try a test build with 7.5.0?

peastman commented 4 years ago
/home/conda/openmm/platforms/opencl/../common/include/openmm/common/ComputeKernel.h:63:34: error: no member named 'is_trivially_copyable' in namespace 'std'
    typename std::enable_if<std::is_trivially_copyable<T>::value, void>::type addArg(const T& value) {
                            ~~~~~^

I recognize that error. It's what you get when using a very old version of GCC that supported most, but not quite all, of C++11. Can you update to a newer compiler?

jchodera commented 4 years ago

Hm, you're right:

(root) [conda@f3639bd53253 ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/opt/rh/devtoolset-2/root/usr/libexec/gcc/x86_64-redhat-linux/4.8.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/opt/rh/devtoolset-2/root/usr --mandir=/opt/rh/devtoolset-2/root/usr/share/man --infodir=/opt/rh/devtoolset-2/root/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --enable-languages=c,c++,fortran,lto --enable-plugin --with-linker-hash-style=gnu --enable-initfini-array --disable-libgcj --with-isl=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/isl-install --with-cloog=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/cloog-install --with-mpc=/dev/shm/home/centos/rpm/BUILD/gcc-4.8.2-20140120/obj-x86_64-redhat-linux/mpc-install --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.2 20140120 (Red Hat 4.8.2-15) (GCC) 

I think part of the idea was to make sure the code will run on systems with older glibc.

I believe conda-forge and anaconda use gcc 4.8.5---is that recent enough?

peastman commented 4 years ago

I believe so. Based on the information I could find, it looks like 4.8.1 was the first version to have complete support for C++11.

ThWuensche commented 4 years ago

@jchodera:

It seems likely that we must still be missing some bugfixes that appear in 7.5.0 but not in this patched 7.4.2. I'll try a quick core22 test build from openmm/openmm master right now to see if we can verify this is the case.

John, actually I tried on my local build of 7.4.2 from this repository and it worked. So I doubt it is a problem with missing bugfix backports. After it worked on 7.4.2 I did not try on 7.5 from the upstream repository, as I did see no sense. Please see my comments to your messages on slack.

jchodera commented 4 years ago

Thanks!