compas / grasp

General Relativistic Atomic Structure Package
https://compas.github.io/grasp/
MIT License
56 stars 26 forks source link

rmcdhf_mpi performance issue #98

Open mkkarlsen opened 1 year ago

mkkarlsen commented 1 year ago

Hi.

I am experiencing some poor performance when running rmcdhf_mpi.

I run on a single node with multiple tasks per node. Performance seems worse the more tasks I reserve (not the case for rangular and rci). I have for example recorded 61 min/iteration when using 10 tasks, and then <18 min/iteration when using 4 tasks in an otherwise identical run.

I have set

export MPI_TMP="/cluster/home/username/Grasp/workdir/tmp_mpi"

and the directory "tmp_mpi" is created within the working directory with the subdirectories "000", "001".. etc.

The poorer performance with more processing power leads me to believe that the read speed is the limiting factor.

Any ideas what I can do to fix this?

Thanks, Martin

jongrumer commented 1 year ago

Hi Martin,

Yes, the read speed is indeed one of the most limiting factors, and the self-consistent process is in general tricky parallelize, with many bottlenecks. While e.g. rangular or rci is trivial since the matrix is set up once (and then diagonalized in the case of ci). You can try the new rmcdhf_mem_mpi if you have enough RAM to store the MCP files.

Cheers, Jon

AnjaApp commented 1 year ago

Hi, I would like to add the tip to try out your calculations on a machine that uses an SSD for memory storage. HDDs experience a major drop in the read speed for larger calculations. I was able to carry out calculations with GRASP, which required up to 20 Tb of storage on an SSD and still maintained 100% CPU usage due to the fast read speed. You could also try to converge the input wave function as best as possible before you start your calculation to reduce the number of needed iterations. Edit: You could also try out the ZF method to reduce the work load (see manual p.299)

jongrumer commented 1 year ago

Thanks @AnjaApp, great advice!