AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
554 stars 353 forks source link

The VisMF::Read seems to have a rather slow effciency #4159

Closed pkufourier closed 2 months ago

pkufourier commented 2 months ago

Please see the attached source code for the test of VisMF::Read and VisMF::Write. test_readwrite.zip

On my PC, the test result is :

Initializing AMReX (24.01)... MPI initialized with 1 MPI processes MPI initialized with thread support level 0 OMP initialized with 16 OMP threads AMReX (24.01) initialized Hello world from AMReX version 24.01 Before write, cell0=1.999256e+00 Write cost 1.70 seconds Read cost 21.08 seconds After read, cell0=1.999256e+00 AMReX (24.01) finalized

Note that the chk file is only 502MB (double precision),, and the Read cost 21 seconds. I think it has nothing to do with the hardware, since my PC's health condition is good for daily usage (e.g., if I copy and paste the generated chk fold in system, it will finish with 1 second). And I firstly found this problem is on computing server for writing and reading large checkpoint files (about 32GB). The same goes that the writing is very fast, but the reading is like to be suspended.

WeiqunZhang commented 2 months ago
$ ./main3d.gnu.ex 
Initializing AMReX (24.09-35-g23a7f34fd7a4)...
AMReX (24.09-35-g23a7f34fd7a4) initialized
Hello world from AMReX version 24.09-35-g23a7f34fd7a4
Before write, cell0=1.999256e+00
Write cost 0.37 seconds
Read cost 0.71 seconds
After read, cell0=1.999256e+00
AMReX (24.09-35-g23a7f34fd7a4) finalized

is what I get.

WeiqunZhang commented 2 months ago

I noticed that you are using OMP. So I tried it too.

$ OMP_NUM_THREADS=16 ./main3d.gnu.OMP.ex 
Initializing AMReX (24.09-35-g23a7f34fd7a4)...
OMP initialized with 16 OMP threads
AMReX Warning: You might be oversubscribing CPU cores with OMP threads.
               There are 8 cores per node.
               But OMP is initialized with 16 threads per process.
               You should consider setting OMP_NUM_THREADS=8 or less in the environment.
AMReX (24.09-35-g23a7f34fd7a4) initialized
Hello world from AMReX version 24.09-35-g23a7f34fd7a4
Before write, cell0=1.999256e+00
Write cost 0.37 seconds
Read cost 0.73 seconds
After read, cell0=1.999256e+00
AMReX (24.09-35-g23a7f34fd7a4) finalized

I also tried amrex 24.01. The results are similar.

pkufourier commented 2 months ago

Correct. I find out the problem on my PC: I am using WSL, and previously the work folder is under the windows NTFS partition mounted by WSL. After I copied the work folder to the native file system of WSL, the read can be finished around 1 second. I'm now testing the IO speed on the server.

pkufourier commented 2 months ago

After a serials of tests, I finally found out the problem on server, that the suspend of Read() happened ONLY when using MPI run on multiple processes, as well as writing relatively large data (like >8GB). The issue is attributed to the MVAPICH environment (since the infiniband network is used on server). After I switched the MPI environment to MPICH, the problem is solved.

A strange thing is, the computation and writing data to disk is normal with the MVAPICH environment. Only the Vis::Read() is affected by the MPI. Meanwhile, when the data size is small, the speed is not influenced too.