Open Dankomaister opened 1 year ago
Thanks for reporting this. Can you confirm the output of ctest -L deterministic
? Does everything pass? This will give us a hint as to whether there is a general problem or perhaps something more restricted to your example input.
You are correct about the NaN trap - we have this as a sanity check inside QMCPACK.
Okay I ran the ctest -L deterministic
and it failed on one test ntest_nexus_qdens_radial
, this is the final output of ctest -L deterministic
99% tests passed, 1 tests failed out of 1149
Label Time Summary:
QMCPACK = 579.21 sec
QMCPACK-checking-results = 42.83 sec
converter = 35.27 sec
coverage = 21.96 sec
deterministic = 910.71 sec
nexus = 143.86 sec
quality_unknown = 859.53 sec
unit = 101.23 sec
Total Test time (real) = 913.79 sec
The following tests FAILED:
2135 - ntest_nexus_qdens_radial (Failed)
From the name I guess this is a nexus test so perhaps not that relevant?
I guess this is a nexus test so perhaps not that relevant?
Correct. This is a test of the qdens tool for analyzing densities and is not relevant here.
The results from the other tests indicates that the code does not have any major issues and that it should be good for production science runs. The tests include VMC and DMC runs for several simple solids.
Okay so these test does not help to narrow down the bug with my calculation. So what is the next step?
Next steps
It is a holiday here today but we'll discuss among the developers in subsequent days.
FYI, we have reproduced this crash with latest Intel OneAPI compiler. A single determinant (no Jastrow) VMC run can trigger it. So, e.g., something is wrong with the inputs and our processing of them, our construction of the spline orbitals, or perhaps the H5 is somehow bad.
@Dankomaister could you provide the following info
cat /etc/os-release # OS info
ldd --version # glibc version
In addition, in your qmcpack build directory
nm src/QMCWaveFunctions/CMakeFiles/qmcwfs.dir/BsplineFactory/SplineC2R.cpp.o |grep sincos
Hi @ye-luo,
Here is the information you asked for
cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
ldd --version
ldd (GNU libc) 2.17
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
nm src/QMCWaveFunctions/CMakeFiles/qmcwfs.dir/BsplineFactory/SplineC2R.cpp.o | grep sincos
U __svml_sincos2_l9
U __svml_sincos4_l9
U __svml_sincosf4_l9
U __svml_sincosf8_l9
As you can tell from the questions, we think this is an issue with vectorization of transcendental functions. This could be a library/compiler issue but we can't rule out an issue on our side yet. Are you able to keep doing production with the GNU compilation? It should not be much slower than an Intel build.
Hi sure I can use the GNU version, but would be nice if this could be fixed.
Hi! I have this problem running qmcpack-3.15.0 compiled using intel compilers 2021.4.0. The compiled version passes tests
ctest -R qe
however when I run my actual calculations (see input_files) I get the following errorFatal Error. Aborting at QMCHamiltonian::evaluate component Kinetic returns NaN
Compiling the same qmcpack version using gnu compilers works fine, passing tests
ctest -R qe
and finishing my dmc calculation without errors. The error message seems like a custom one from qmcpack? I would like to be able to compile and run qmcpack using intel compilers so any help solving this would be appreciated.Steps to reproduce the behavior
Expected behavior I expect the behavior to be the same for qmcpack compiled with intel as with gnu compilers, i.e. my calculation completes without errors.
System
Additional context Output from the calculation is attached slurm-223491.zip