Closed yuxzhou closed 3 weeks ago
@yuxzhou Thanks for reporting the issue, could you please tell me the version of calphy, lammps, and pylammpsmpi that you are using?
Thanks for the quick reply @srmnitc. I followed the installation introduction from the website: (1) git clone https://github.com/ICAMS/calphy.git
; (2) cd calphy
and conda env create -f environment.yml
; (3) conda activate calphy
; and (4) python setup.py install
.
I also double check the versions of LAMMPS (21 Nov 2023 released), pylammpsmpi (0.2.3), and Calphy (1.2.16).
I now spotted an issue with the version of pylammpsmpi; could you please update with conda install -c conda-forge pylammpsmpi=0.2.13
and try. I will now update the env file. A new conda release is already being worked on.
Thanks for help! The update of pylammpsmpi to 0.2.13 indeed helps wake up the lammps. However, there is still now any output from LAMMPS and I received another error in the *err file (while the job was still running).
No OpenFabrics connection schemes reported that they were able to be
used on a specific port. As such, the openib BTL (OpenFabrics
support) will be disabled for this port.
Local host: nid001169
Local device: mlx5_0
Local port: 1
CPCs attempted: rdmacm, udcm
--------------------------------------------------------------------------
[nid001169:239896] 1023 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[nid001169:239896] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I tried everything on the other cluster, and got a mpi error again:
[c03r4n35:31497] OPAL ERROR: Not initialized in file ext3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:
version 16.05 or later: you can use SLURM's PMIx support. This
requires that you configure and build SLURM --with-pmix.
Versions earlier than 16.05: you must use either SLURM's PMI-1 or
PMI-2 support. SLURM builds PMI-1 by default, or you can manually
install PMI-2. You must then build Open MPI using --with-pmi pointing
to the SLURM PMI library location.
Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[c03r4n35:31497] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
[c03r4n35:31500] OPAL ERROR: Not initialized in file ext3x_client.c at line 112
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[nid001169:239896] 1023 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[nid001169:239896] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Any clue of thow to solve it?
Thank you!
Thanks again! What do you get in the calphy.log file?
Sorry for the late response. This is what calphy.log looks like in one of my calphy calculations.
2024-02-26 20:45:24,878 calphy.helpers INFO ---------------input file----------------
2024-02-26 20:45:24,879 calphy.helpers INFO commented out as causes crash when we're expanding the T range after a fail run
2024-02-26 20:45:24,879 calphy.helpers INFO ------------end of input file------------
2024-02-26 20:45:24,879 calphy.helpers INFO Temperature start: 900.000000 K, temperature stop: 500.000000 K, pressure: 0.000000 bar
2024-02-26 20:45:24,879 calphy.helpers INFO Pressure adjusted in iso
2024-02-26 20:45:24,879 calphy.helpers INFO Reference phase is liquid
2024-02-26 20:45:24,879 calphy.helpers INFO Melting cycle is turned off
2024-02-26 20:45:24,879 calphy.helpers INFO Equilibration stage is done using nose-hoover barostat/thermostat
2024-02-26 20:45:24,879 calphy.helpers INFO Nose-Hoover thermostat damping is 0.100000
2024-02-26 20:45:24,879 calphy.helpers INFO Nose-Hoover barostat damping is 0.100000
2024-02-26 20:45:24,879 calphy.helpers INFO These values can be tuned by adding in the input file:
2024-02-26 20:45:24,879 calphy.helpers INFO nose_hoover:
2024-02-26 20:45:24,879 calphy.helpers INFO thermostat_damping: <float>
2024-02-26 20:45:24,879 calphy.helpers INFO barostat_damping: <float>
2024-02-26 20:45:24,879 calphy.helpers INFO Integration stage is done using Nose-Hoover thermostat and barostat when needed
2024-02-26 20:45:24,879 calphy.helpers INFO Thermostat damping is 0.100000
2024-02-26 20:45:24,880 calphy.helpers INFO Barostat damping is 0.100000
2024-02-26 20:45:24,880 calphy.helpers INFO 4536 atoms in 1 cells on 128 cores
2024-02-26 20:45:24,880 calphy.helpers INFO pair_style: pace
2024-02-26 20:45:24,880 calphy.helpers INFO pair_coeff: * * /work/e846/e846/yx_zhou/ML_potentials/Te-ACE/te-upfit-iter0.yace Te
It seems that the lammps has been launched but not correctly (i.e., died due to some reasons)?
Seems to be the case, could you please check the version of mpi4py in the environment, thanks again!
Sure! The version of mpi4py is 3.1.4 in my conda environment
I cant seem to reproduce this on the lammps side, could you please run a LAMMPS calculation directly through the library interface, and see if that works.
Closing due to inactivity, please feel free to reopen if needed.
Dear developers and users,
I'm trying to run a ts calculation in which I expect MD will be run after the initialization. However, the job seemed to get stuck when it was trying to wake the lammps driver. The job didn't die but no output was generated.
This is the calphy.log file:
and this is my input:
Any idea of what is going wrong?