Closed AndreWeiner closed 3 years ago
Dear Dr. Weiner,
I tested the commands mentioned in this issue except synchronizing, and I found that they worked well.
However, when I ran the command qsub jobscript name_of_simulation
(Here, the name of simulation is 'turbulentFlatPlate_noWallFunc'), Phoenix server mentioned that it cannot find qsub
command as follows.
-bash: qsub: command not found
I copied the singularity image of OpenFOAM and jobscript file to '~/wall_function' folder, and the simulation case 'turbulentFlatPlate_noWallFunc' to '~/wall_function/run' folder in Phoenix. Nevertheless, I could not execute the simulation due to lack of SLURM package in the server.
I found that sbatch
command is available in the server, but I am not sure whether I should use qsub
or sbatch
for Phoenix cluster. Even I installed slurm-wlm-torque package in my local system, but of course it did not work because the cluster does not use my local packages.
Therefore, I would really appreciate it if any solution could be provided for above. Thank you very much for your help.
Best regards, Jihoo Kang
Dear Jihoo,
of course, you're right about using sbatch
. I mixed up different scheduler commands. qsub
is a PBS command. The correct syntax to submit a job is
sbatch jobscript name_of_simulation
I'll fix the README. Thanks for the correction!
Best, Andre
Dear Dr. Weiner,
I proceeded with the simulation by using sbatch
in the cluster, and then the command sbatch
itself worked well. However, the script 'Allrun.singularity' failed to get the proper results, even though I tried 3 times with different settings. Thus, I would like to report this problem here.
I tried 2 steps (applied to 2nd and 3rd attempts) as follows:
I found that the below message is mentioned at the very first part of all the log files of 'simpleFoam'. Therefore, I modified mpirun
part in 'functions' file by adding --mca btl '^openib'
option. Afterward, the below message disappeared, but the error still emerged.
--------------------------------------------------------------------------
By default, for Open MPI 4.0 and later, infiniband ports on a device
are not used by default. The intent is to use UCX for these devices.
You can override this policy by setting the btl_openib_allow_ib MCA parameter
to true.
Local host: node068
Local adapter: hfi1_0
Local port: 1
WARNING: There was an error initializing an OpenFabrics device.
2. I changed one line of script from ```module load mpi/openmpi/4.0.1/cuda_aware_gcc_6.3.0``` to ```module load mpi/openmpi/4.0.5_gcc_9.3/openmpi``` in '*jobscript*' file, but the result was the same. This means that the above message and changing the version of mpi are not related to this situation.
If this problem would happen in both of the local system and the cluster, I could solve it. However, the simulation works well in the local system, whereas the problem emerges only in Phoenix cluster. Hence, this means that at least there is no problem with '*functions*' file.
Please find total four attachment files related to this error (changed to *txt* file).
- *SLURM* log file for two simulations (1st and 3rd attempts).
- One of the log files of '*simpleFoam*' for two simulations (1st and 3rd attempts).
I would really appreciate it if any solution could be provided for above.
Thank you very much for your help.
Best regards,
Jihoo Kang
[(1stAttempt)log.simpleFoam.kOmegaSST_0.05_1e-3.txt](https://github.com/AndreWeiner/wall_modeling/files/6327789/1stAttempt.log.simpleFoam.kOmegaSST_0.05_1e-3.txt)
[(3rdAttempt)log.simpleFoam.kOmegaSST_0.05_1e-3.txt](https://github.com/AndreWeiner/wall_modeling/files/6327790/3rdAttempt.log.simpleFoam.kOmegaSST_0.05_1e-3.txt)
[slurm-1603432.out.txt](https://github.com/AndreWeiner/wall_modeling/files/6327791/slurm-1603432.out.txt)
[slurm-1603532.out.txt](https://github.com/AndreWeiner/wall_modeling/files/6327792/slurm-1603532.out.txt)
Dear Dr. Weiner,
I figured out why it had happened.
The problem was that the singularity image could not properly be loaded during the simulation. When I checked any of the log files for sbatch (ex. 'slurm-1603532.out'), the following message was found.
./Allrun.singularity: line 64: foamDictionary: command not found
This means that the script cannot use any OpenFOAM properties in the singularity image. Currently, the image is loaded only when the functions singularityRun
and singularityRunParallel
execute. Therefore, I decided that I load the singularity image in the first place before running ./Allrun
script by modifying only 'jobscript' file as follows.
## submit job
echo "Submitting case $1"
cd run/$1
image="../../of_v2012.sif"
bashrc="/usr/lib/openfoam/openfoam2012/etc/bashrc"
singularity exec $image bash -c "source $bashrc && ./Allrun"
Since the image is loaded in the first place, we do not need to make any new 'Allrun' script and separate functions for the image. Thus, ./Allrun
is executed in 'jobscript' file instead of ./Allrun.singularity
. For the same reason, I need to add 'Allclean.singularity' script because the image should be loaded in order to use the properties of 'Allclean'.
Consequently, I will proceed with the work as follows.
I think 'Allrun.singularity' and 'functions' files are no longer needed, but I will keep them now. We can delete them later when we decided that they are really not needed.
Now I finished checking all the commands referred in this issue work well, and the problem seems to be solved. Therefore, we can close this issue if there is no more new problem.
Thank you for reading this comment.
Best regards, Jihoo Kang
Dear Jihoo,
the workflow you suggested won't work unfortunately with executing both serial and parallel programs. foamDictionary
is an OpenFOAM utility. Whenever you need to run an OF utility, use singularityRun theApp
or singularityRunParallel -np numberProcs theApp -parallel
. I'll explain in more detail why it has to be done that way in our next meeting (I've tried the workflow you suggested before). So, the best fix should be
singularityRun foamDictionary -entry boundaryField.bottomWall.value ...
If you run the app multiple times, make sure to delete the log files or to use a suffix, e.g.,
singularityRun -s Cx foamDictionary -entry boundaryField.bottomWall.value ...
Hope that helps.
Best, Andre
Hi @JihooKang-KOR, I couldn't find a quick fix for the foamDictionary issue, so for now let's stick to the solution you found. Best, Andre
Dear Andre,
Thank you for checking the foamDictionary issue, and I understood your comment. I will keep what I did in 'Allrun.singularity' script.
If there is no more additional task for this issue, we might close this issue.
Best regards, Jihoo Kang
Hi Jihoo,
I added several files in the last commit to run simulations with Singularity on TU Braunschweig's Phoenix cluster. Have a look at the README and other new files.
To log in on phoenix, I recommend an alias, e.g., add the following line to your ~/.bashrc
and source the file again with
source ~/.bashrc
. Then you can typephoenix
to start an ssh connection. I also recommend creating a key-pair to save some time logging in.To copy a file to or from the cluster, use
scp -r
(-r for recursive copying of folders):To copy simulation results from the cluster, I typically synchronize the entire run folder:
Let me know if these commands work for you.
Best, Andre