Fast math breaks some tests

bindgens1 commented 5 years ago

I have two failing tests on our cluster with the current development branch:

1)

espresso/src/core/unit_tests/grid_test.cpp(79)

espresso/src/core/unit_tests/grid_test.cpp(82)

Fails with:
error: in "get_mi_coord_test": absolute value of std::abs(get_mi_coord(a, b, box_l, true) - (a - b) - box_l){4.4408920985006262e-16} exceeds 2.2204460492503131e-16

error: in "get_mi_coord_test": absolute value of std::abs(get_mi_coord(b, a, box_l, true) - (b - a) + box_l){4.4408920985006262e-16} exceeds 2.2204460492503131e-16

This seems to be the same issue twice

2)

Test  #44: rotational_inertia ......................***Failed

FF
======================================================================
FAIL: test_energy_and_momentum_conservation (__main__.RotationalInertia)
----------------------------------------------------------------------
Traceback (most recent call last):
/espresso/build/testsuite/python/rotational_inertia.py", line 154, in test_energy_and_momentum_conservation
    self.assertAlmostEqual(self.energy(p), E0, places=3)
AssertionError: 14.85059589306411 != 14.85 within 3 places

======================================================================
FAIL: test_stability (__main__.RotationalInertia)
----------------------------------------------------------------------
Traceback (most recent call last):
/espresso/build/testsuite/python/rotational_inertia.py", line 84, in test_stability
    i, k, self.L_0_lab[k], self.L_lab[k]))
AssertionError: 0.004388219520887526 not less than or equal to 0.004 : Inertial motion around stable axis J1: Deviation in angular momentum is too large. Step 2, coordinate 2, expected -0.7955, got -0.7998882195208875

----------------------------------------------------------------------
Ran 2 tests in 0.096s

FAILED (failures=2)

I compiled Espresso with the intel compiler. Are these known issues that occured before? In case the intel compiler is no longer supported, I will try again with gcc the next days.

mkuron commented 5 years ago

The Intel compiler is supported and we last ran CI tests with version 2018 Update 4 about one week ago. I've just triggered a build on the current master to make sure that it still works: https://gitlab.icp.uni-stuttgart.de/espressomd/espresso/-/jobs/135095.

Which version are you using? Did you enable any non-standard optimizations? What CPU are you running on?

bindgens1 commented 5 years ago

I didn't specify any special compiler flags.

vsc32262@login1:espresso$ icpc -V
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 18.0.1.163 Build 20171018
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

vsc32262@login1:espresso$ icc -V
Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 18.0.1.163 Build 20171018
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

vsc32262@login1:build$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                36
On-line CPU(s) list:   0-35
Thread(s) per core:    1
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Stepping:              4
CPU MHz:               2300.000
BogoMIPS:              4600.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0-17
NUMA node1 CPU(s):     18-35

fweik commented 5 years ago

Intel turns on fast math automatically, which relaxes the floating point rounding rules. This is probably why the unit test fails.

bindgens1 commented 5 years ago

The failures didn't occur two weeks ago. That's why I was a bit concerned.

Apparently your CI doesn't have a problem with the Intel Compiler. Feel free to ignore the issue if you don't consider it a general problem.

Thanks!

mkuron commented 5 years ago

Could you try disabling fast-math by adding -fp-model=strict to your CMAKE_CXX_FLAGS? I don't think we've ever tested Espresso with the Intel compiler on a Skylake processor, so it's entirely possible that you found a bug here that should be fixed.

bindgens1 commented 5 years ago

Test suite passes without issue.

Thanks!

mkuron commented 5 years ago

Could you please re-open this issue? We should still fix it on the Espresso side. Preferably not by adding that flag, but by actually fixing either the test or the arithmetic.

bindgens1 commented 5 years ago

Sure!

I just thought it might not be needed if it's only effecting me, as we found a way to circumvent the failing tests.

I might add: Turning fast math on with the GNU compiler, let's the DPD test fail. Nothing else.

mkuron commented 5 years ago

Shouldn't be too difficult to fix. Please change the issue title to "fast math breaks some tests" or something like that and I might have a look during the next coding day.

fweik commented 5 years ago

The errors in rotational_inertia look large for rounding issues, maybe there the test isn't ideal, in the sense that errors are adding up. In general it's not clear to me how you can say what the expected precision is if the rounding rules are relaxed. If we want to support that someone should look up what fast math exactly means, and how much the results depend on hardware.

jngrad commented 5 years ago

Maybe have a look at the Consistency of Floating-Point Results using the Intel® Compiler, section Bottom Line, which provides Linux/macOS/Windows compiler flags to get consistent results across different CPU architecture. That paragraph is copy-pasted from the PDF in the page footer, which contains more details on the approximations made in fast math mode.

jngrad commented 3 years ago

There are a few places in the core, unit tests and python tests where floating point comparison is carried out with extremely small tolerances. These operations break on fast-math mode (tested with GCC and Clang) and could potentially break on old 32-bit architectures, e.g. the i586 arch which is currently part of the python3-espressomd package CI pipeline on OpenSUSE.

espressomd / espresso

Fast math breaks some tests #2977