AndreWeiner / wall_modeling

Development of OpenFOAM wall functions for turbulent flows
GNU General Public License v3.0
9 stars 7 forks source link

Test workflow on Phoenix cluster #3

Closed AndreWeiner closed 3 years ago

AndreWeiner commented 3 years ago

Hi Jihoo,

I added several files in the last commit to run simulations with Singularity on TU Braunschweig's Phoenix cluster. Have a look at the README and other new files.

To log in on phoenix, I recommend an alias, e.g., add the following line to your ~/.bashrc

alias phoenix="ssh -Y user_name@phoenix.hlr.rz.tu-bs.de"

and source the file again with source ~/.bashrc. Then you can type phoenix to start an ssh connection. I also recommend creating a key-pair to save some time logging in.

To copy a file to or from the cluster, use scp -r (-r for recursive copying of folders):

# copy image to home folder on cluster
scp -r of_v2012.sif user_name@phoenix.hlr.rz.tu-bs.de:/home/user_name/wall_modeling/
# copy file back to workstation
scp -r user_name@phoenix.hlr.rz.tu-bs.de:/home/user_name/wall_modeling/some_file.txt ./

To copy simulation results from the cluster, I typically synchronize the entire run folder:

# run from the repository's top-level folder
rsync -avz user_name@phoenix.hlr.rz.tu-bs.de:/home/user_name/wall_modeling/run ./

Let me know if these commands work for you.

Best, Andre

JihooKang-KOR commented 3 years ago

Dear Dr. Weiner,

I tested the commands mentioned in this issue except synchronizing, and I found that they worked well.

However, when I ran the command qsub jobscript name_of_simulation (Here, the name of simulation is 'turbulentFlatPlate_noWallFunc'), Phoenix server mentioned that it cannot find qsub command as follows.

-bash: qsub: command not found

I copied the singularity image of OpenFOAM and jobscript file to '~/wall_function' folder, and the simulation case 'turbulentFlatPlate_noWallFunc' to '~/wall_function/run' folder in Phoenix. Nevertheless, I could not execute the simulation due to lack of SLURM package in the server.

I found that sbatch command is available in the server, but I am not sure whether I should use qsub or sbatch for Phoenix cluster. Even I installed slurm-wlm-torque package in my local system, but of course it did not work because the cluster does not use my local packages.

Therefore, I would really appreciate it if any solution could be provided for above. Thank you very much for your help.

Best regards, Jihoo Kang

AndreWeiner commented 3 years ago

Dear Jihoo,

of course, you're right about using sbatch. I mixed up different scheduler commands. qsub is a PBS command. The correct syntax to submit a job is

sbatch jobscript name_of_simulation

I'll fix the README. Thanks for the correction!

Best, Andre

JihooKang-KOR commented 3 years ago

Dear Dr. Weiner,

I proceeded with the simulation by using sbatch in the cluster, and then the command sbatch itself worked well. However, the script 'Allrun.singularity' failed to get the proper results, even though I tried 3 times with different settings. Thus, I would like to report this problem here.

I tried 2 steps (applied to 2nd and 3rd attempts) as follows:

  1. I found that the below message is mentioned at the very first part of all the log files of 'simpleFoam'. Therefore, I modified mpirun part in 'functions' file by adding --mca btl '^openib' option. Afterward, the below message disappeared, but the error still emerged.

    
    --------------------------------------------------------------------------
    By default, for Open MPI 4.0 and later, infiniband ports on a device
    are not used by default.  The intent is to use UCX for these devices.
    You can override this policy by setting the btl_openib_allow_ib MCA parameter
    to true.
    
    Local host:              node068
    Local adapter:           hfi1_0
    Local port:              1


WARNING: There was an error initializing an OpenFabrics device.

Local host: node068 Local device: hfi1_0



2. I changed one line of script from ```module load mpi/openmpi/4.0.1/cuda_aware_gcc_6.3.0``` to ```module load mpi/openmpi/4.0.5_gcc_9.3/openmpi``` in '*jobscript*' file, but the result was the same. This means that the above message and changing the version of mpi are not related to this situation.

If this problem would happen in both of the local system and the cluster, I could solve it. However, the simulation works well in the local system, whereas the problem emerges only in Phoenix cluster. Hence, this means that at least there is no problem with '*functions*' file.

Please find total four attachment files related to this error (changed to *txt* file).
- *SLURM* log file for two simulations (1st and 3rd attempts).
- One of the log files of '*simpleFoam*' for two simulations (1st and 3rd attempts).

I would really appreciate it if any solution could be provided for above.
Thank you very much for your help.

Best regards,
Jihoo Kang

[(1stAttempt)log.simpleFoam.kOmegaSST_0.05_1e-3.txt](https://github.com/AndreWeiner/wall_modeling/files/6327789/1stAttempt.log.simpleFoam.kOmegaSST_0.05_1e-3.txt)
[(3rdAttempt)log.simpleFoam.kOmegaSST_0.05_1e-3.txt](https://github.com/AndreWeiner/wall_modeling/files/6327790/3rdAttempt.log.simpleFoam.kOmegaSST_0.05_1e-3.txt)
[slurm-1603432.out.txt](https://github.com/AndreWeiner/wall_modeling/files/6327791/slurm-1603432.out.txt)
[slurm-1603532.out.txt](https://github.com/AndreWeiner/wall_modeling/files/6327792/slurm-1603532.out.txt)
JihooKang-KOR commented 3 years ago

Dear Dr. Weiner,

I figured out why it had happened.

The problem was that the singularity image could not properly be loaded during the simulation. When I checked any of the log files for sbatch (ex. 'slurm-1603532.out'), the following message was found.

./Allrun.singularity: line 64: foamDictionary: command not found

This means that the script cannot use any OpenFOAM properties in the singularity image. Currently, the image is loaded only when the functions singularityRun and singularityRunParallel execute. Therefore, I decided that I load the singularity image in the first place before running ./Allrun script by modifying only 'jobscript' file as follows.

## submit job
echo "Submitting case $1"
cd run/$1
image="../../of_v2012.sif"
bashrc="/usr/lib/openfoam/openfoam2012/etc/bashrc"
singularity exec $image bash -c "source $bashrc && ./Allrun"

Since the image is loaded in the first place, we do not need to make any new 'Allrun' script and separate functions for the image. Thus, ./Allrun is executed in 'jobscript' file instead of ./Allrun.singularity. For the same reason, I need to add 'Allclean.singularity' script because the image should be loaded in order to use the properties of 'Allclean'.

Consequently, I will proceed with the work as follows.

  1. Revising 'jobscript' file in the top repository folder.
  2. Adding 'Allclean.singularity' to the simulation folder.

I think 'Allrun.singularity' and 'functions' files are no longer needed, but I will keep them now. We can delete them later when we decided that they are really not needed.

Now I finished checking all the commands referred in this issue work well, and the problem seems to be solved. Therefore, we can close this issue if there is no more new problem.

Thank you for reading this comment.

Best regards, Jihoo Kang

AndreWeiner commented 3 years ago

Dear Jihoo, the workflow you suggested won't work unfortunately with executing both serial and parallel programs. foamDictionary is an OpenFOAM utility. Whenever you need to run an OF utility, use singularityRun theApp or singularityRunParallel -np numberProcs theApp -parallel. I'll explain in more detail why it has to be done that way in our next meeting (I've tried the workflow you suggested before). So, the best fix should be

singularityRun foamDictionary -entry boundaryField.bottomWall.value ...

If you run the app multiple times, make sure to delete the log files or to use a suffix, e.g.,

singularityRun -s Cx foamDictionary -entry boundaryField.bottomWall.value ...

Hope that helps.

Best, Andre

AndreWeiner commented 3 years ago

Hi @JihooKang-KOR, I couldn't find a quick fix for the foamDictionary issue, so for now let's stick to the solution you found. Best, Andre

JihooKang-KOR commented 3 years ago

Dear Andre,

Thank you for checking the foamDictionary issue, and I understood your comment. I will keep what I did in 'Allrun.singularity' script.

If there is no more additional task for this issue, we might close this issue.

Best regards, Jihoo Kang