Closed sabrygad closed 6 years ago
... here are the steps I used to get TRIQS to "work" (it is a little different from the documentation, though, as the steps there did not work):
module load python/anaconda-5.0.1 cmake boost mkl intel-mpi gcc/7.2.0 conda create -n dft python=2 numpy scipy matplotlib tornado mako jinja2 pyzmq h5py mpi4py source activate dft mkdir TRIQS cd TRIQS git clone https://github.com/TRIQS/triqs mkdir triqsbuild cd triqs git checkout 1.4.1 sed -i '43s/${MKL_PATH_WITH_PREFIX}/"${MKL_PATH_WITH_PREFIX}"/' cmake/FindLapack.cmake cd /projects/academic/kofke/software/TRIQS/triqsbuild CC=gcc CXX=g++ CFLAGS=-pthread cmake -DPTYHON_LIBRARY=/user/sabrygad/.conda/envs/dft/lib/libpython2.7.so ../triqs make -j12 make test make install
Does this has to do with the slow behavior I have?
Thanks; Sabry
Hi,
[Node 0] Simulation lasted: 889 seconds [Node 0] Number of measures: 350000 but the actual timing for each DMFT step is ~5,000; so about 6x slower.
What is exactly 6 times slower? A run of the impurity solver (CTHYB) or solution of the self-consistency equations? If the slowdown is seen in calls to components of dft_tools, you should open an issue on the appropriate issue tracker (I have almost no experience with dft_tools and, therefore, cannot help much in that case).
Also, I would recommend to pass -DCMAKE_BUILD_TYPE=Release
as a part of CMake command line when building TRIQS and its applications.
Check the MKL. We had similar pb recently, the MKL was threading by default, which is not a good idea when running MPI on the nodes. Try something like : export OMP_NUM_THREADS=1 in the env. variables.
Thanks for your reply.
It looks like nothing to be blamed (i.e. CTHYB or others). We just learned something interesting about this problem. In our SLURM system, the code runs slow once I use more than one core "--ntasks-per-node" (for whatever many of nodes, "--nodes")! Once I fix number of cores to 1, the code runs much faster (~6x), as it should be (based on a benchmark we have). Setting OMP_NUM_THREADS=1 did not change the CPU time for both slow/fast cases.
I am afraid it has to do with mpirun that is used in WIEN2k run_lapw script that calls CTHYB? As you know srun is recommended with SLURM; however, it did not work for more than a total of one core. For example, with this command in the submit script:
srun -n 2 pytriqs case.py
I get these errors:
h5repack error:
So, it seems it is how pytriqs talks to mpirun/srun; does TRIQS only work with mpirun?
Thanks; Sabry
This indeed looks very much like a threading problem. One comments: Are you sure that setting OMP_NUM_THREADS=1 is really done in your slurm environment, or just in the shell where you submit the job? This might be different, I had a similar issue on a supercomputer some time ago. There, environmental variables had to be set explicitly in the submit script, since the job just did not take them from the submit shell. If nothing works, moving to different libraries (no MKL, no intel MPI) could be a solution.
And do set 'repacking=False' in the SumK initialisation in your case.py script. This is really not needed in that case and could at least resolve your h5repack problem.
I am closing this due to inactivity. Please reopen if needed. Best, Hugo
Hi;
I am doing LDA+DMFT. I built all TRIQS applications using Intel-MPI, however the code is slow (5-10x) as compared to a benchmark case.
I also noticed that the time printed in my sdout job.out file is much smaller than what the job actually takes. For example, here it said 889 seconds, [Node 0] Simulation lasted: 889 seconds [Node 0] Number of measures: 350000 but the actual timing for each DMFT step is ~5,000; so about 6x slower.
So, may be the data exchange between cores is slow.
Do I have to use Open-MPI instead?
Thanks a lot; Sabry