Closed farhi closed 6 years ago
in gpaw:output.py: line 10 should be replaced with:
try:
from ase.version import version as ase_version
except ImportError:
ase_version = ase.__version__
but then GPAW still blocks because of the pickle load which probably contains the calculator and can not be imported back ??
Confirmed bug on 14.04/jessie and 16.04. Fix above does not fix. the pickle EOFError is still there.
rank=2 L00: Traceback (most recent call last):
rank=2 L01: File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/sqw_phonons_forces_iterate.py", line 24, in <module>
rank=2 L02: ret = ifit.phonopy_run(ph, single=True)
rank=2 L03: File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/ifit.py", line 885, in phonopy_run
rank=2 L04: set_of_forces, flag = phonopy_run_calculate(phonon, phonpy, supercell, single)
rank=2 L05: File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/ifit.py", line 959, in phonopy_run_calculate
rank=2 L06: feq = phonons_run_eq(phonon, supercell)
rank=2 L07: File "/tmp/tp596fe8b1_da04_4c95_bc56_769b12dd825c/ifit.py", line 306, in phonons_run_eq
rank=2 L08: output = pickle.load(open(filename))
rank=2 L09: File "/usr/lib/python2.7/pickle.py", line 1384, in load
rank=2 L10: return Unpickler(file).load()
rank=2 L11: File "/usr/lib/python2.7/pickle.py", line 864, in load
rank=2 L12: dispatch[key](self)
rank=2 L13: File "/usr/lib/python2.7/pickle.py", line 886, in load_eof
rank=2 L14: raise EOFError
rank=2 L15: EOFError
GPAW CLEANUP (node 2): <type 'exceptions.EOFError'> occurred. Calling MPI_Abort!
This could be related to MPI with GPAW. All ranks then try to read/write an unfninished file (equilibrium). An additional:
could be a solution ?
Confirmed this is an MPI issue:
rank=0 L07: File "/usr/local/lib/python2.7/dist-packages/ase/atoms.py", line 731, in get_forces
rank=0 L08: forces = self._calc.get_forces(self)
rank=0 L09: File "/usr/lib/python2.7/dist-packages/gpaw/aseinterface.py", line 78, in get_forces
rank=0 L10: force_call_to_set_positions=force_call_to_set_positions)
rank=0 L11: File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 250, in calculate
rank=0 L12: self.initialize(atoms)
rank=0 L13: File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 374, in initialize
rank=0 L14: self.synchronize_atoms()
rank=0 L15: File "/usr/lib/python2.7/dist-packages/gpaw/paw.py", line 1034, in synchronize_atoms
rank=0 L16: mpi.synchronize_atoms(self.atoms, self.wfs.world)
rank=0 L17: File "/usr/lib/python2.7/dist-packages/gpaw/mpi/__init__.py", line 714, in synchronize_atoms
rank=0 L18: err_ranks)
rank=0 L19: ValueError: ('Mismatch of Atoms objects. In debug mode, atoms will be dumped to files.', array([1, 2, 3]))
GPAW CLEANUP (node 0): <type 'exceptions.ValueError'> occurred. Calling MPI_Abort!
works in serial 'mpi=1'
Seems to be solved when upgrading GAPW to version 1.2 which works OK with ASE 3.14. Created upgraded packages of ASE and GPAW in packages.ccode.org Considered fixed.
Must probably add some MPI Barrier sync when rank is no 0.
Now fixed by putting barriers and switching to ASE/PHON method, avoiding PhonoPy stuff. Commit https://github.com/McStasMcXtrace/iFit/commit/82b0bf5229facbae4cb9a20da4daa79d42f0ab05
The barrier calls seem to hand the calculation at least on Trusty.
The commit https://github.com/McStasMcXtrace/iFit/commit/22cf1f3d241cfc31232abbd7c22356e28544b75e seems to have solved this issue, e.g. the 'isfile' is now MPI aware, which avoids stalling processes.
Error is:
So from ase.version import version as ase_version fails as this is now ase.version