Colvars / colvars

Collective variables library for molecular simulation and analysis programs
http://colvars.github.io/
GNU Lesser General Public License v3.0
196 stars 56 forks source link

Intel compiler 2019 fails to calculate the eigenvectors correctly #695

Open HanatoK opened 1 month ago

HanatoK commented 1 month ago

The test code can be found in https://github.com/HanatoK/Intel_Compiler_2019_bug_test, which should use the same algorithm as Colvars. However, even if I use Eigen3 or math_eigen_impl.h, the calculations are consistently wrong for Intel compiler 2019.

This issue happens on Frontera with the default Intel compiler, the version of which shows

icpc (ICC) 19.1.1.217 20200306
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

This issue affects the calculation of orientation and Euler angles the most, and RMSD seems to be less affected.

giacomofiorin commented 1 month ago

Reporting here a Slack message from Dave Hardy:

We've had some past issues with over-optimization by Intel's compilers leading to errors in NAMD. Eric Bohm did some testing and determined that this particular issue is resolved by the Intel 2023 compilers.

giacomofiorin commented 3 weeks ago

Do they have MKL available alongside that compiler on Frontera?

HanatoK commented 3 weeks ago

Do they have MKL available alongside that compiler on Frontera?

Yes, but I later found the problem was not due to the eigendecomposition. Simple code as follows without calling any other 3rd libraries calculating the correlation matrix can go wrong:

struct Coordinate {
  double x;
  double y;
  double z;
};

using AtomGroup = std::vector<Coordinate>;

void build_correlation_matrix(
  const AtomGroup& ag, const AtomGroup& ag_ref, double out[3][3]) {
  double mat_R[3][3];
  for (size_t i = 0; i < 3; ++i) {
    for (size_t j = 0; j < 3; ++j) {
      mat_R[i][j] = 0;
    }
  }
  for (size_t i = 0; i < ag.size(); ++i) {
    mat_R[0][0] += ag[i].x * ag_ref[i].x;
    mat_R[0][1] += ag[i].x * ag_ref[i].y;
    mat_R[0][2] += ag[i].x * ag_ref[i].z;
    mat_R[1][0] += ag[i].y * ag_ref[i].x;
    mat_R[1][1] += ag[i].y * ag_ref[i].y;
    mat_R[1][2] += ag[i].y * ag_ref[i].z;
    mat_R[2][0] += ag[i].z * ag_ref[i].x;
    mat_R[2][1] += ag[i].z * ag_ref[i].y;
    mat_R[2][2] += ag[i].z * ag_ref[i].z;
  }
  // print_matrix<3, 3>(mat_R);
  for (size_t i = 0; i < 3; ++i) {
    for (size_t j = 0; j < 3; ++j) {
      out[i][j] = mat_R[i][j];
    }
  }
}

In my opinion the Intel/2019 (19.1.1) compiler is just too dangerous to use.

giacomofiorin commented 3 weeks ago

Yes, but I later found the problem was not due to the eigendecomposition. Simple code as follows without calling any other 3rd libraries calculating the correlation matrix can go wrong:

Unbelievable. This code could not be any simpler.

In my opinion the Intel/2019 (19.1.1) compiler is just too dangerous to use.

It definitely looks that way. Thank you for checking!