OrderN / CONQUEST-release

Full public release of large scale and linear scaling DFT code CONQUEST
http://www.order-n.org/
MIT License
96 stars 25 forks source link

Error in the calculations with ghost atoms. #90

Open tsuyoshi38 opened 3 years ago

tsuyoshi38 commented 3 years ago

With the version in "develop" branch, we have found that there are errors when 1) the system includes ghost atoms 2) we use multiple processes and 3) some of the processes has only ghost atoms.

davidbowler commented 2 years ago

What kind of errors? A failure, or a problem with the energy or force?

tsuyoshi38 commented 2 years ago

CONQUEST stops with an error something like the error finding a neighbor atoms (or matrix elements). I thought it may be related to the fact that ghost atoms does not have any neighbors for some matrix elements, but I do not remember the details now. I will report more detailed information in a few days.

davidbowler commented 2 years ago

I think that it is this error (that I have also found):

 Completed set_blocks()
 Completed set_domains()
          Allocating memory for distribute_atom
 Completed distribute_atoms()
  rcut for BCS_parts =   34.920331988484620     
 Members in covering set:         2016
 Made covering set for matrix multiplications
  Error in process   16
  Error in process   16
  Atoms are too far apart! Minimum distance is     100.000000000000

This is related to the routine subroutine check_InterAtomicDistances in mult_module.f90. You can run successfully without it by setting IO.CheckInitialAtomicDistances F in the input, but we need to understand why the error happens.

davidbowler commented 2 years ago

I have encountered this bug today. The problem comes from the distance detection routine in mult_module.f90:

https://github.com/OrderN/CONQUEST-release/blob/e0f11dc46ffc73c58b315f355625824b599873cf/src/mult_module.f90#L3537-L3547

We exclude ghost atoms from the distance check, so if a process has only ghost atoms, it will fail this test. I think that a simple test around line 3566 to check if r_min is 100.0 would solve this; alternatively we could set a flag if a process has only ghost atoms (locally in this routine or globally).

@tsuyoshi38 @ayakon what do you think?