Closed pkufourier closed 2 months ago
$ ./main3d.gnu.ex
Initializing AMReX (24.09-35-g23a7f34fd7a4)...
AMReX (24.09-35-g23a7f34fd7a4) initialized
Hello world from AMReX version 24.09-35-g23a7f34fd7a4
Before write, cell0=1.999256e+00
Write cost 0.37 seconds
Read cost 0.71 seconds
After read, cell0=1.999256e+00
AMReX (24.09-35-g23a7f34fd7a4) finalized
is what I get.
I noticed that you are using OMP. So I tried it too.
$ OMP_NUM_THREADS=16 ./main3d.gnu.OMP.ex
Initializing AMReX (24.09-35-g23a7f34fd7a4)...
OMP initialized with 16 OMP threads
AMReX Warning: You might be oversubscribing CPU cores with OMP threads.
There are 8 cores per node.
But OMP is initialized with 16 threads per process.
You should consider setting OMP_NUM_THREADS=8 or less in the environment.
AMReX (24.09-35-g23a7f34fd7a4) initialized
Hello world from AMReX version 24.09-35-g23a7f34fd7a4
Before write, cell0=1.999256e+00
Write cost 0.37 seconds
Read cost 0.73 seconds
After read, cell0=1.999256e+00
AMReX (24.09-35-g23a7f34fd7a4) finalized
I also tried amrex 24.01. The results are similar.
Correct. I find out the problem on my PC: I am using WSL, and previously the work folder is under the windows NTFS partition mounted by WSL. After I copied the work folder to the native file system of WSL, the read can be finished around 1 second. I'm now testing the IO speed on the server.
After a serials of tests, I finally found out the problem on server, that the suspend of Read() happened ONLY when using MPI run on multiple processes, as well as writing relatively large data (like >8GB). The issue is attributed to the MVAPICH environment (since the infiniband network is used on server). After I switched the MPI environment to MPICH, the problem is solved.
A strange thing is, the computation and writing data to disk is normal with the MVAPICH environment. Only the Vis::Read() is affected by the MPI. Meanwhile, when the data size is small, the speed is not influenced too.
Please see the attached source code for the test of VisMF::Read and VisMF::Write. test_readwrite.zip
On my PC, the test result is :
Initializing AMReX (24.01)... MPI initialized with 1 MPI processes MPI initialized with thread support level 0 OMP initialized with 16 OMP threads AMReX (24.01) initialized Hello world from AMReX version 24.01 Before write, cell0=1.999256e+00 Write cost 1.70 seconds Read cost 21.08 seconds After read, cell0=1.999256e+00 AMReX (24.01) finalized
Note that the chk file is only 502MB (double precision),, and the Read cost 21 seconds. I think it has nothing to do with the hardware, since my PC's health condition is good for daily usage (e.g., if I copy and paste the generated chk fold in system, it will finish with 1 second). And I firstly found this problem is on computing server for writing and reading large checkpoint files (about 32GB). The same goes that the writing is very fast, but the reading is like to be suspended.