McStasMcXtrace / iFit

a simple library to analyze data (with McCode and Phonons/DFT hooks). :warning: this project has been moved to https://gitlab.com/soleil-data-treatment/soleil-software-projects/remote-desktop
http://ifit.mccode.org
Other
5 stars 5 forks source link

Models: Phonons: using GPAW 0.9 with ASE 3.14 crashes #107

Closed farhi closed 6 years ago

farhi commented 7 years ago

Error is:

s_gpaw=sqw_phonons('POSCAR_MgO','gpaw')
calc = GPAW(usesymm=False, txt="/tmp/tpe7b13361_50f8_4d2b_b6aa_4fe846e203ec/gpaw.log", kpts=(4,4,4), mode="pw", xc="PBE", eigensolver="rmm-diis", convergence={"energy":1e-05})
/usr/local/lib/python2.7/dist-packages/ase/calculators/neighborlist.py:5: UserWarning: Moved to ase.neighborlist
  warnings.warn('Moved to ase.neighborlist')
Traceback (most recent call last):
  File "/tmp/tpe7b13361_50f8_4d2b_b6aa_4fe846e203ec/sqw_phonons_forces_iterate.py", line 19, in <module>
    from gpaw import GPAW, PW, FermiDirac
  File "/usr/lib/python2.7/dist-packages/gpaw/__init__.py", line 239, in <module>
    from gpaw.aseinterface import GPAW
  File "/usr/lib/python2.7/dist-packages/gpaw/aseinterface.py", line 12, in <module>
    from gpaw.paw import PAW
  File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 38, in <module>
    from gpaw.output import PAWTextOutput
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/gpaw/output.py", line 10, in <module>
    from ase.version import version as ase_version
ImportError: No module named version
GPAW CLEANUP (node 1): <type 'exceptions.ImportError'> occurred.  Calling MPI_Abort!

So from ase.version import version as ase_version fails as this is now ase.version

farhi commented 7 years ago

in gpaw:output.py: line 10 should be replaced with:

try:
  from ase.version import version as ase_version
except ImportError:
  ase_version = ase.__version__

but then GPAW still blocks because of the pickle load which probably contains the calculator and can not be imported back ??

farhi commented 7 years ago

Confirmed bug on 14.04/jessie and 16.04. Fix above does not fix. the pickle EOFError is still there.

rank=2 L00: Traceback (most recent call last):
rank=2 L01:   File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/sqw_phonons_forces_iterate.py", line 24, in <module>
rank=2 L02:     ret = ifit.phonopy_run(ph, single=True)
rank=2 L03:   File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/ifit.py", line 885, in phonopy_run
rank=2 L04:     set_of_forces, flag = phonopy_run_calculate(phonon, phonpy, supercell, single)
rank=2 L05:   File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/ifit.py", line 959, in phonopy_run_calculate
rank=2 L06:     feq = phonons_run_eq(phonon, supercell)
rank=2 L07:   File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/ifit.py", line 306, in phonons_run_eq
rank=2 L08:     output = pickle.load(open(filename))
rank=2 L09:   File "/usr/lib/python2.7/pickle.py", line 1384, in load
rank=2 L10:     return Unpickler(file).load()
rank=2 L11:   File "/usr/lib/python2.7/pickle.py", line 864, in load
rank=2 L12:     dispatch[key](self)
rank=2 L13:   File "/usr/lib/python2.7/pickle.py", line 886, in load_eof
rank=2 L14:     raise EOFError
rank=2 L15: EOFError
GPAW CLEANUP (node 2): <type 'exceptions.EOFError'> occurred.  Calling MPI_Abort!

This could be related to MPI with GPAW. All ranks then try to read/write an unfninished file (equilibrium). An additional:

could be a solution ?

farhi commented 7 years ago

Confirmed this is an MPI issue:

rank=0 L07:   File "/usr/local/lib/python2.7/dist-packages/ase/atoms.py", line 731, in get_forces
rank=0 L08:     forces = self._calc.get_forces(self)
rank=0 L09:   File "/usr/lib/python2.7/dist-packages/gpaw/aseinterface.py", line 78, in get_forces
rank=0 L10:     force_call_to_set_positions=force_call_to_set_positions)
rank=0 L11:   File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 250, in calculate
rank=0 L12:     self.initialize(atoms)
rank=0 L13:   File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 374, in initialize
rank=0 L14:     self.synchronize_atoms()
rank=0 L15:   File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 1034, in synchronize_atoms
rank=0 L16:     mpi.synchronize_atoms(self.atoms, self.wfs.world)
rank=0 L17:   File "/usr/lib/python2.7/dist-packages/gpaw/mpi/__init__.py", line 714, in synchronize_atoms
rank=0 L18:     err_ranks)
rank=0 L19: ValueError: ('Mismatch of Atoms objects.  In debug mode, atoms will be dumped to files.', array([1, 2, 3]))
GPAW CLEANUP (node 0): <type 'exceptions.ValueError'> occurred.  Calling MPI_Abort!

works in serial 'mpi=1'

farhi commented 7 years ago

Seems to be solved when upgrading GAPW to version 1.2 which works OK with ASE 3.14. Created upgraded packages of ASE and GPAW in packages.ccode.org Considered fixed.

farhi commented 7 years ago

Must probably add some MPI Barrier sync when rank is no 0.

farhi commented 7 years ago

Now fixed by putting barriers and switching to ASE/PHON method, avoiding PhonoPy stuff. Commit https://github.com/McStasMcXtrace/iFit/commit/82b0bf5229facbae4cb9a20da4daa79d42f0ab05

farhi commented 7 years ago

The barrier calls seem to hand the calculation at least on Trusty.

farhi commented 6 years ago

The commit https://github.com/McStasMcXtrace/iFit/commit/22cf1f3d241cfc31232abbd7c22356e28544b75e seems to have solved this issue, e.g. the 'isfile' is now MPI aware, which avoids stalling processes.