SPARC-X / SPARC-X-API

GNU General Public License v3.0
11 stars 10 forks source link

Segmentation fault when running socket in run mode b with OFDFT. #50

Open ltimmerman3 opened 1 week ago

ltimmerman3 commented 1 week ago

Describe the bug ConnectionResetError due to SPARC exiting with exit code 139. Log files show segfault during socket function calls. So far, has only occurred with OFDFT.

To Reproduce Provide a minimal list of settings / codes to help us debug, such as

Expected behavior Geometry optimization on Si nanocluster

Actual output or error trace sparc.log [atl1-1-02-018-18-2:649235:0:649235] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil)) ==== backtrace (tid: 649230) ==== 0 0x000000000005f10c ucs_callbackq_cleanup() ???:0 1 0x000000000005f2ca ucs_callbackq_cleanup() ???:0 2 0x000000000003e6f0 GI_sigaction() :0 3 0x0000000000409f4e Calculate_local_kpoints() ???:0 4 0x00000000005cd934 reinit_mesh() ???:0 5 0x00000000005ce61e read_atoms_position_fom_socket() ???:0 6 0x00000000005cf9b2 main_Socket() ???:0 7 0x000000000040550d main() ???:0 8 0x0000000000029590 libc_start_call_main() ???:0 9 0x0000000000029640 libc_start_main_alias_2() :0 10 0x0000000000405535 _start() ???:0

socket.log Accepting clients on UNIX-socket /tmp/ipi_sparc_ce1e0a Close socket server pted connection from Driver: calculate Driver: status Driver: sendmsg 'STATUS' Driver: recvmsg 'READY' Driver: sendposdata Driver: sendmsg 'POSDATA' Driver: send 72 bytes of <class 'numpy.float64'> Driver: send 72 bytes of <class 'numpy.float64'> Driver: send 4 bytes of <class 'numpy.int32'> Driver: send 120 bytes of <class 'numpy.float64'> Driver: status Driver: sendmsg 'STATUS' Close socket server

Traceback /storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py:364: UserWarning: Subprocess exited with status 139 warnings.warn('Subprocess exited with status {}' Traceback (most recent call last): File "/storage/coda1/p-amedford6/0/ltimmerman3/socketApplications/sparc_runs/OFDFT_run_mode_b/test_run_b.py", line 39, in dyn.run(fmax=0.05) File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/optimize/optimize.py", line 269, in run return Dynamics.run(self) File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/optimize/optimize.py", line 156, in run for converged in Dynamics.irun(self): File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/optimize/optimize.py", line 122, in irun self.atoms.get_forces() File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/atoms.py", line 788, in get_forces forces = self._calc.get_forces(self) File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/abc.py", line 23, in get_forces return self.get_property('forces', atoms) File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/calculator.py", line 737, in get_property self.calculate(atoms, [name], system_changes) File "/storage/home/hcoda1/9/ltimmerman3/p-amedford6-0/socketApplications/SPARC-X-API/sparc/calculator.py", line 532, in calculate self._calculate_with_socket( File "/storage/home/hcoda1/9/ltimmerman3/p-amedford6-0/socketApplications/SPARC-X-API/sparc/calculator.py", line 632, in _calculate_with_socket ret = self.in_socket.calculate_origin_protocol(atoms[self.sort]) File "/storage/home/hcoda1/9/ltimmerman3/p-amedford6-0/socketApplications/SPARC-X-API/sparc/socketio.py", line 226, in calculate_origin_protocol return self.protocol.calculate(atoms.positions, atoms.cell) File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 189, in calculate msg = self.status() File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 152, in status msg = self.recvmsg() File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 62, in recvmsg msg = self._recvall(12) File "/storage/coda1/p-amedford6/0/ltimmerman3/venvs/sockApp/lib64/python3.9/site-packages/ase/calculators/socketio.py", line 51, in _recvall chunk = self.socket.recv(remaining) ConnectionResetError: [Errno 104] Connection reset by peer

Python run file attached

Using SPARC

from sparc.calculator import SPARC from ase import Atoms from ase.io import read from ase.build import molecule from ase.optimize import BFGS import numpy as np

Si = read('struct.in.traj') Si.pbc = [True, True, True]

calc_params = { "EXCHANGE_CORRELATION": "LDA_PZ", "KPOINT_GRID": [1,1,1], "MESH_SPACING": 0.35, "MAXIT_SCF": 150, "ELEC_TEMP_TYPE": "fermi-dirac", "ELEC_TEMP": 100, "ION_TEMP": 100, "PRINT_RESTART_FQ": 10, "PRINT_ATOMS": 1, "PRINT_FORCES": 1, "SPIN_TYP": 0, "OFDFT_FLAG": 1, "OFDFT_LAMBDA": 0.2, "TOL_OFDFT": 1e-3, }

with SPARC(use_socket=True, **calc_params) as calc:

Execute single-point calculations

Si.calc = calc
#water.get_potential_energy()
dyn = BFGS(Si)
dyn.run(fmax=0.05)
alchem0x2A commented 1 week ago

Thx for the info, it seems the error is related to the function in socket C-code where the mesh is re-initialized, possibly need to update that function to match the parameters in ofdft, will look into this.