Closed bapatist closed 7 months ago
Thank you for flagging that! The only thing I can think of now would be a sudden increase in density, that would create more edges and therefore more memory footprint. Could you look at the average number of neighbors near the end to see if there is a huge spike?
I see, but if it also appears in NVT, the average neighbours density should stay the same right? (no vacuum region appears the box).
I did use a bigg-ish r_max
value (=6) for this. Maybe going down to something like 4.5 can help reducing #edges.
Edit: Just plotted RDFs and I don't see a spike towards the end of the simulation.
Can you tell me what branch are you using?
The develop
branch
Can you try to run the same MD with calculator on the main branch? No need to retrain, just pull the main branch and change model_paths
to model_path
.
Okay, so I observe interestingly different behaviour from the main branch. The simulation didn't crash but was 40% slower compared to the run using the develop branch. I couldn't test if there was still a memory leak since I hit wall time. I will rerun for max wall time and report back if I ever reach an "out-of-memory" crash. Is the speed difference expected?
Speed difference comes from the neighbourlist I think.
yes it is the effect of the matscipy neighbourlist.
Do you have the same memory problem in main @bapatist ?
I haven't encountered it on main branch yet. But I did not test on the big and long simulation. We have moved on to using LAMMPS on develop branch, I will report again if I see any memory problems.
@bapatist Do you have any update on this issue? Did you experience any memory leak again?
No new updates since I switched to LAMMPS for all MD tasks. @jungsdao experienced this the latest in our group. I'll check with him once as well.
Hello all, My ASE-MD in NPT ensemble for a 2124 atoms simulation crashes after running for 17.5 picoseconds. The system contains 3 atomic species for a solid-water interface. It took ~2h 45m before crashing with the "CUDA out of memory" error. The system does not explode with stable velocities till the very end. I used a single GPU on a shared node for this task:
I tried running an NVT on the final structure but this time on a full node with 4 GPUs and 500GB memory which also results in the same out-of-memory crash after a stable run for 135 picoseconds (taking 23 hours).
For more details, attached are the relevant files for the single GPU run case. ase_npt.py.txt err.txt slurm_out.txt run.txt
Any help/discussion will be much appreciated. Thank you!