Closed momierr closed 2 years ago
Hi Radolphe,
Great that you're trying out GRASP for HPC!
It's been many years since I ran the codes through a batch system. But let's see what we can do. For starters, can you show us what your disks
file look like? The first line specifies the runtime directory (where the i/o files are stored). The MCP files are written to/read from the temporary directories specified in the subsequent lines of disks
when you run rangular_mpi
+ rmcdhf_mpi
(i.e. not in the working directory as you wrote!). Also, make sure you specify the number of processes somehow (without a batch system the call would be something like mpirun -np 20 rangular_mpi
.
All the best, Jon
Hi, Please find the disks file here-after:
The first line: '/work/icb/ro0028mo/atst/test/'
is the directory I run the job from, the 16 other lines are for the temporary files as I understood from the GRASP2018 manual (I put here /work/icb/ro0028mo/
only for testing purposes, although each machine of the cluster has a proper /tmp
directory).
Before the job crashes, folders 000 to 015 are created by rmcdhf_mpi
in the directory specified in the disks file. Nota bene that I just ran rangular
(not rangular_mpi
) to obtain all the MCP files, which are therefore located in my "test" directory, along with isodata
and the other requested files.
As I understood, the command mpiib
is a specific command for our HPC cluster and is used to run mpi processes through the infiniband network. As can be seen from the crash report, it is equivalent to mpirun -dapl -np 16 rmcdhf_mpi
Hope my answer will be useful to you, thanks for giving me such a quick feedback. Best, Rodolphe
Alright, so you need to run rangular_mpi
with the same disks
file then. You can't combine serial rangular
with rmcdhf_mpi
.
Try it! :)
Cheers, Jon
I tried... and it worked! Thanks again Jon for giving such an helpful and quick feedback. 😄 All the best, Rodolphe
No worries Radolphe, happy to help! Let me/us know if you have further problems or need advice on how to set up your correlation model. Which group do you work with by the way? Just curious where GRASP is used 🧐😅
Cheers, Jon
I am a (1st year) PhD student working with Prof. Claude Leroy (Dijon, France), Pr. David Sarkisyan and Pr. Aram Papoyan (Yerevan, Armenia): https://www.researchgate.net/profile/Rodolphe-Momier if you want to have a look! I am just getting familiar with GRASP, the goal being the computation of some HFS parameters for states where experimental data is missing. 😄 Cheers, Rodolphe
Hello,
I am trying to use GRASP2018 on my university's HPC cluster. I managed to compile the package correctly and the non-mpi versions of the programs work as expected in an interactive session.
Now, say I want to submit a job file to the HPC cluster (which is controlled by Sun Grid Engine) consisting in a single run of rmcdhf_mpi on 16 cores with the script below:
The executables are on the path, and the job is launched from the directory containing all needed files (mcp.*, isodata, etc as well as the 'disks' file).
What I understand from the output of SGE (text file attached) is that rmcdhf_mpi gets launched and understands where to store the temporary data according to the disks file, but doesn't seem to understand where to get the input files and therefore set the number of blocks to 0 which aborts the calculation.
job_output.txt
Any idea how to solve that? I am probably making very stupid mistakes but I am only a beginner. 😄 Thanks, Rodolphe