TRIQS / triqs_0.x

DEPRECATED -- This is the repository of the older versions of TRIQS
Other
11 stars 9 forks source link

1) LDA+DMFT in parallel 2) "run_lapw -qdmft" or "runsp_lapw -qdmft?" #136

Closed saeidjalali closed 11 years ago

saeidjalali commented 11 years ago

Dear WIEN2TRIQS group, Two inquiries: 1) I would like to run WIEN2TRIQS self-consistently in parallel mode. To this end, I use the following command employing a suitable .machines file: run_lapw -qdmft -p

The WIEN2k part of my running, i.e., LDA part, is performed in parallel without any problem. But, DMFT part of my running is performed in serial mode. Although, my case is converged perfectly, but the speed of the DMFT part is not so fast. So to speed up the TRIQS calculations, I need to run it in parallel mode as well as WIEN2k.

I noticed in http://ipht.cea.fr/triqs/doc/user_manual/wien2k/Ce-HI.html that one should do as follows: run_para -qdmft But, there is no "run_para" script in $WIENROOT! There are only run_lapw and runs_lapw and some other scripts in the $WIENROOT.

In "run_lapw" script, I see two following lines: pytriqs $file.py mpirun -c $NSLOTS -x PYTHONPATH ./pytriqs $file.py --same_dir

The first line is for running on single node, and the second line seems to be for running in parallel. But, when I run_lpaw with -qdmft -p flags, the first line is called.

2) I am also not sure whether I should use runsp_lapw -qdmft or run_lapw -qdmft. As you know, runsp_lpaw is used for magnetic systems, and run_lapw for nonmagnetic systems. But, I see in the above link, where Ce example is discussed, that run_lapw -qdmft is applied on this magnetic system. And more surprisingly, the up DOS and down DOS of the Ce case are generated. So, I think that the pytriqs treats with the system magnetically even if we use run_lapw. What will be the difference between "run_lapw -qdmft" and "runsp_lapw -qdmft" if pytriqs handles the system magnetically even by run_lapw -qdmft?

Best regards, Saeid

leopo commented 11 years ago
  1. Unfortunately, at present one cannot run self-consistent LDA+DMFT with the parallel Wien-2k. Only the DMFT part implemented in TRIQS part can be run in parallel. The scripts for running LDA+DMFT have been renamed to run_triqs and runsp_triqs, you may find them in share/triqs/Wien2k_SRC_files/SRC_templates in your triqs install directory. You need to insert manually the appropriate for your system call of MPI loader as described in docs http://ipht.cea.fr/triqs/doc/user_manual/install/wien2k_inst.html
  2. Choice between runsp_triqs and run_triqs depends on whether you want to run spin-polarized (LSDA) or non-spin-polarized (LDA) Wien-2k part. In DMFT part both spins are included in all cases, and you may have spin polarization appearing due to the corresponding symmetry breaking in the DMFT self-energy even in the case when the one-particle part of the Hamiltonian was computed with non-magnetic LDA Wien-2k.
saeidjalali commented 11 years ago

Dear Leopo,

Thank you for your valuable comments. In my triqs_build/INSTALL_DIR/share/triqs/Wien2k_SRC_files/SRC_templates/ there are only the following files: case.cf_f_mm2 case.cf_p_cubic case.indmftpr I think this is due to the fact that I am using an older version of the dmftproj, i.e., TRIQS-TRIQS-6f4c392, and not the latest version. So, I just changed python_with_DMFT in run_lapw of the WIEN2k to pytriqs. It works fine. But, it is slow.

Unfortunately, at present one cannot run self-consistent LDA+DMFT with the parallel Wien-2k. Only the DMFT part implemented in TRIQS part can be run in parallel.

I think (but not sure) that the above problem originates from the fact that WIEN2k uses mpich-1.2.7p1, but triqs uses openmpi. On my system both of them are installed. In addition mpich2 is also installed on my system. So, would you, please, let me know what changes I need to do in run_lapw so that I can only run DMFT part of my calculations in parallel. Since WIEN2k seems to be much faster than triqs, so I prefer to only run the DMFT part in parallel.

Best wishes, Saeid.

leopo commented 11 years ago

Yes, it seems you have an old version of the package. You need to call pytriqs from run_para (right after dmftproj is called) with an appropriate MPI wrapper, which depends on your system (it may be called mpirun, mpprun, aprun...). I guess, in the documentation for your cluster you should be able to find info on what MPI wrapper should be used for parallel runs. Some examples I use in different systems:

aprun -n $NSLOTS /cfs/klemming/nobackup/l/leopo/TRIQS/triqs_install/bin/pytriqs $file.py

mpirun -x PYTHONPATH /home/leonid/TRIQS/triqs_install/bin/pytriqs $file.py

mpprun --force-mpi=openmpi/1.3.2-i110074 /home/x_leopo/TRIQS/triqs_install/bin/pytriqs $file.py ( --force-mpi forces the wrapper to use a particular version of openmpi).

saeidjalali commented 11 years ago

Dear Leonid, I changed the following lines in run_lapw script:

if($NSLOTS == 1) then python_with_DMFT $file.py else mpirun -c $NSLOTS -x PYTHONPATH ./python_with_DMFT $file.py --same_dir endif

to:

if($NSLOTS == 1) then pytriqs $file.py else mpirun /usr/local/codes/dmftproj/triqs_build/INSTALL_DIR/bin/pytriqs $file.py endif

I have removed -c $NSLOTS -x PYTHONPATH --same_dir flags, since the program complines that these flags are unknown.

Now, "run_lapw -qdmft 1" runs the "pytriqs $file.py" line and "run_lapw -qdmft 2" runs the "mpirun /usr/local/codes/dmftproj/triqs_build/INSTALL_DIR/bin/pytriqs $file.py" line successfully.

For sure, I directly executes the following line in terminal: mpirun /usr/local/codes/dmftproj/triqs_build/INSTALL_DIR/bin/pytriqs mycase.py

Similarly the above command also runs successfully.

But the problem now is that it just uses one node and one core.

I added -np 12, but it agains does not uses the other CPU's and cores. I added -n 12 flags with no effect. I created host file: touch host nano host: node1:12 node2:12

and added "-f host" with no effect. 1) Would you tell me how I can run over my available nodes and cores?

2) Would you send me the run_triqs and runsp_triqs? My e-mail address is: saeid.jalali.asadabadi@gmail.com

3) You stated that "You need to call pytriqs from run_para ...". Where is run_para?

Thank you,

Saeid.

leopo commented 11 years ago

Are you sure that your pytriqs has been complied with MPI? If you cannot launch parallel calculations directly with

mpirun /usr/local/codes/dmftproj/triqs_build/INSTALL_DIR/bin/pytriqs mycase.py

it does not make sense to work on the run scripts. They call exactly the same line.

The problem is apparently related either with the pytriqs compilation or with the way you call the MPI wrapper. I guess, the first thing to do here is to contact support of the cluster.

saeidjalali commented 11 years ago

I have compiled the code by:

cmake -DBLAS_LIBRARY=-lmkl_core -DLAPACK_LIBRARY="-lmkl_intel_lp64;-lmkl_sequential;-lpthread" -DCBLAS_INCLUDE_DIR=/opt/intel/composer_xe_2011_sp1/mkl/include/ -DLAPACK_LINKER_FLAGS=-L/opt/intel/composer_xe_2011_sp1/mkl/lib/intel64/ /usr/local/codes/dmftproj/TRIQS-TRIQS-6f4c392/ -DBOOST_SOURCE_DIR=/usr/local/codes/dmftproj/boost_1_47_0

For more information see: https://gist.github.com/saeidjalali/3301298

As I said mpirun /usr/local/codes/dmftproj/triqs_build/INSTALL_DIR/bin/pytriqs mycase.py works fine, and the only problem is to run over several nodes.

saeidjalali commented 11 years ago

Dear Leonid, I could eventually fix the problem. Just for future information, I would report that the problem originated from the kind of the MPI. I used mpirun of mpich1. I changed it to the mpi of the intel with no effect. Finally, I used the openmpi. The mpirun of the openmpi solves the problem. Now, it runs over my cpus, and cores.

So here I would first thank you and second close the issue. Ciao, Saeid.