Open mkkarlsen opened 1 year ago
Hi Martin,
Yes, the read speed is indeed one of the most limiting factors, and the self-consistent process is in general tricky parallelize, with many bottlenecks. While e.g. rangular or rci is trivial since the matrix is set up once (and then diagonalized in the case of ci). You can try the new rmcdhf_mem_mpi
if you have enough RAM to store the MCP files.
Cheers, Jon
Hi, I would like to add the tip to try out your calculations on a machine that uses an SSD for memory storage. HDDs experience a major drop in the read speed for larger calculations. I was able to carry out calculations with GRASP, which required up to 20 Tb of storage on an SSD and still maintained 100% CPU usage due to the fast read speed. You could also try to converge the input wave function as best as possible before you start your calculation to reduce the number of needed iterations. Edit: You could also try out the ZF method to reduce the work load (see manual p.299)
Thanks @AnjaApp, great advice!
Hi.
I am experiencing some poor performance when running rmcdhf_mpi.
I run on a single node with multiple tasks per node. Performance seems worse the more tasks I reserve (not the case for rangular and rci). I have for example recorded 61 min/iteration when using 10 tasks, and then <18 min/iteration when using 4 tasks in an otherwise identical run.
I have set
export MPI_TMP="/cluster/home/username/Grasp/workdir/tmp_mpi"
and the directory "tmp_mpi" is created within the working directory with the subdirectories "000", "001".. etc.
The poorer performance with more processing power leads me to believe that the read speed is the limiting factor.
Any ideas what I can do to fix this?
Thanks, Martin