ansys / pyfluent

Pythonic interface to Ansys Fluent
https://fluent.docs.pyansys.com
MIT License
253 stars 42 forks source link

Error when running PyFluent in HPC/Slurm environment #2351

Closed christospliakos closed 5 months ago

christospliakos commented 7 months ago

πŸ” Before submitting the issue

🐞 Description of the bug

So I'm following the example of using PyFluent with a scheduler on an HPC machine (https://fluent.docs.pyansys.com/version/stable/user_guide/launching_ansys_fluent.html#scheduler-support). I specify the path where FLUENT is installed (tried different paths as well) but no luck.

The way that I'm launching fluent from the python script is: solver = launch_fluent(mode="solver", precision='double', show_gui=False, gpu=True, start_timeout=60) Bellow are the bash file and the slurm-out file.

πŸ“ Steps to reproduce

bash file:

!/bin/bash

SBATCH --job-name=FLUENT-2023R2-gpu-case

SBATCH --partition=ampere

SBATCH --gres=gpu:1

SBATCH --nodes=1

SBATCH --ntasks-per-node=1

SBATCH --time=2:00:00

module load intel-oneapi-compilers/2022.0.2-sygdcrc python/3.10.10-tuelrz module load ansys/2023R2

CREATE HOSTFILE

echo "Creating hostfile..." srun hostname > ${SLURM_JOBID}.hostfile

ACTIVATE VENV

echo "Activating virtual environment..." . ./myenv/bin/activate

EXPORT FLUENT SO IT CAN BE FOUND

echo "Exporting AWP_ROOT232..." export AWP_ROOT232=/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/

RUN PYFLUENT

echo "Running PyFluent..." python benchmark_wing.py

AS SOON AS RUN IS COMPLETE, DELETE HOSTFILE

echo "Cleaning up hostfile..." rm -f ${SLURM_JOBID}.hostfile

Slurm Out file:

remove ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) load ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) Creating hostfile... Exporting AWP_ROOT232... Activating virtual environment... Running PyFluent... pyfluent.launcher ERROR: Exception caught - RuntimeError: The launch process has been timed out. Trying to open solver Traceback (most recent call last): File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py", line 758, in launch_fluent raise ex File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py", line 742, in launch_fluent _await_fluent_launch( File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py", line 383, in _await_fluent_launch raise RuntimeError("The launch process has been timed out.") RuntimeError: The launch process has been timed out.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/benchmark_wing.py", line 8, in solver = launch_fluent(mode="solver", precision='double', show_gui=False, gpu=True, start_timeout=60) File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py", line 797, in launch_fluent raise LaunchFluentError(launch_cmd) from ex ansys.fluent.core.launcher.launcher.LaunchFluentError: Fluent Launch string: /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/bin/fluent 3ddp -gpu -t20 -cnf=cn50:20 -sifile=/tmp/serverinfo-w7rr7m0a.txt -nm -hidden Cleaning up hostfile...

πŸ’» Which operating system are you using?

Linux

πŸ“€ Which ANSYS version are you using?

FLUENT 2023R2

🐍 Which Python version are you using?

3.10

πŸ“¦ Installed packages

ansys-fluent core
raph-luc commented 7 months ago

Thanks for reporting this @christospliakos, I believe the documentation needs to be updated.

An example of how PyFluent 018.2 can currently be used in Slurm environments, as well as a link to the improvements coming in PyFluent 0.19, are available here: https://github.com/ansys/pyfluent/issues/2264#issuecomment-1878820130

christospliakos commented 7 months ago

I've not managed to launch fluent in the HPC/Slurm environment. I intend to run it under the same job that I submit to the cluster system.

I'm using the bash file that I provided and a Python script that simply (for now) runs pyfluent. I see that PyFluent 0.19 was released yesterday so I suspect there are changes in the way it should be done. The documentation is still not updated and I'm kinda lost in the information.

I would like to recreate the example from the documentation of PyFluent - Scheduler support. Even if it has to be done differently.

raph-luc commented 7 months ago

@christospliakos PyFluent 0.19.1 has been released and has improvements for Slurm environments. It provides a way to launch Fluent Slurm jobs from a Python environment. I believe this isn't quite what you had in mind, but do you think this approach would work for you? Rather than setting up a Slurm job that uses PyFluent, letting PyFluent manage the Fluent Slurm job.

For an example of how Slurm is now supported, see the description in https://github.com/ansys/pyfluent/pull/2269. We are also working to update the documentation, as you can see in the work in progress at https://github.com/ansys/pyfluent/pull/2373.

An alternative in the same approach that you are taking (Slurm job that uses PyFluent) would be to set a much longer start_timeout argument for launch_fluent(), as what seems to be causing the issues for you is that launch is timing out presumably because your job is still in the queue.

christospliakos commented 7 months ago

@raph-luc It seems that timeout is not the reason. I usually pass the queue instantly and I raised the timeout to 5mins.

I would like the Slurm Job to control PyFluent since in the job I specify the usage of 1 GPU unit (with the bash file). Only then do I get access to the GPU. For your solution of: "letting PyFluent manage the Fluent Slurm job" do I need 1 job to run the PyFluent script that creates another job for the actual Fluent to work?

The way it works is very confusing imho. The way that is described in the old documentation is very intuitive. A single job that runs PyFluent manages fluent launch with the available computational power which is dictated by the job initialization.

As I'm still navigating my way through HPC/Slurm, a complete and updated documentation with a full example would be very helpful.

dnwillia commented 7 months ago

@raph-luc What the user has in mind is just simply running Fluent within the scheduler environment in batch, no interaction from a remote python console necessary. This is pretty common way to use SLURM and is why I added the support documented in the link that @christospliakos provided.

@christospliakos the documented example should be working, so no need for updating the documentation. What happens if you just start with minimal args:

solver = launch_fluent(mode="solver", precision='double')

Does that work? I must admit, I've never tried this with the gpu argument. Also, is there any information in the standard output or do you happen to be writing a Fluent transcript file we could have a look at?

christospliakos commented 7 months ago

@dnwillia Thanks a lot for the comments. This is exactly what I want to do, as you described it to @raph-luc.

I've tried what you recommended and it still doesn't seem to work. I think that the errors are not the same, with some ssh errors coming first. I've also enabled PyFluent logging.

Slurm output:

Creating hostfile... Activating virtual environment... remove ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) load ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) Exporting AWP_ROOT232... Running PyFluent Script... /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -t20 -cnf=cn41:20 -sifile=/tmp/serverinfo-v4jinchd.txt -nm -hidden Hostfile does not exist, will try to use it as hostname! ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory Host key verification failed. ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory Host key verification failed. /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/cortex/lnamd64/cortex.23.2.0 -f fluent -sifile=/tmp/serverinfo-v4jinchd.txt -nm -hidden (fluent "3ddp -host -r23.2.0 -t20 -cnf=cn41:20 -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh") pyfluent.launcher ERROR: Exception caught - TimeoutError: The launch process has timed out. PyFluent logging file /home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/pyfluent.log Setting PyFluent global logging level to DEBUG. Trying to open solver Traceback (most recent call last): File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 243, in call raise ex File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 227, in call _await_fluent_launch( File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher_utils.py", line 302, in _await_fluent_launch raise TimeoutError("The launch process has timed out.") TimeoutError: The launch process has timed out.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/benchmark_wing.py", line 18, in solver = pyfluent.launch_fluent(mode="solver", precision='double') File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/launcher.py", line 228, in launch_fluent return launcher() File "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/myenv/lib/python3.10/site-packages/ansys/fluent/core/launcher/standalone_launcher.py", line 284, in call raise LaunchFluentError(launch_cmd) from ex ansys.fluent.core.launcher.launcher_utils.LaunchFluentError: Fluent Launch string: /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/bin/fluent 3ddp -t20 -cnf=cn41:20 -sifile=/tmp/serverinfo-v4jinchd.txt -nm -hidden

bash file:

!/bin/bash

SBATCH --job-name=FLUENT-2023R2

SBATCH --partition=rome

SBATCH --ntasks-per-node=20

SBATCH --nodes=1

SBATCH --time=1:00:00

module load gcc/12.2.0 gcc/12.2.0-fhg4pj2 python/3.10.10-abxdifo

CREATE HOSTFILE

echo "Creating hostfile..." srun hostname > ${SLURM_JOBID}.hostfile

ACTIVATE VENV

echo "Activating virtual environment..." source myenv/bin/activate

Loading Ansys

module load ansys/2023R2

EXPORT FLUENT SO IT CAN BE FOUND

echo "Exporting AWP_ROOT232..." export AWP_ROOT232=/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/

RUN PYFLUENT

echo "Running PyFluent Script..." python benchmark_wing.py

dnwillia commented 7 months ago

Did you verify that fluent works on its own already? Just replace the line python benchmark_wing.py with fluent 3ddp -t4 and if that starts we know Fluent is working at least.

christospliakos commented 7 months ago

Yes Fluent surely works. The only change that has to be made to the command that you provided is to use -g parameter in order to work in gui-less environment. fluent 3ddp -t4 -g

Slurm Output:

Creating hostfile... Activating virtual environment... load ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) Exporting AWP_ROOT232... /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -t20 -g /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/cortex/lnamd64/cortex.23.2.0 -f fluent -g (fluent "3ddp -pshmem -host -r23.2.0 -t20 -mpi=intel -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh")

Opening input/output transcript to file "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/fluent-20240124-143418-38647.trn". Auto-Transcript Start Time: 14:34:18, 24 Jan 2024 /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -pshmem -host -t20 -mpi=intel -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh -cx cn41.it.auth.gr:44372:44038 Starting /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/lnamd64/3ddp_host/fluent.23.2.0 host -cx cn41.it.auth.gr:44372:44038 "(list (rpsetvar (QUOTE parallel/function) "fluent 3ddp -flux -node -r23.2.0 -t20 -pshmem -mpi=intel -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "20") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent") (rpsetvar (QUOTE parallel/hostsfile) "") (rpsetvar (QUOTE gpuapp/devices) ""))"

          Welcome to ANSYS Fluent 2023 R2

          Copyright 1987-2023 ANSYS, Inc. All Rights Reserved.
          Unauthorized use, distribution or duplication is prohibited.
          This product is subject to U.S. laws governing export and re-export.
          For full Legal Notice, see documentation.
raph-luc commented 7 months ago

Sorry for the confusion, I just tested this and I can confirm that the documentation is correct.

@christospliakos to be clear, can you try with a very simple Python and Slurm script just so we can try to isolate what is causing this issue? For example this the test.py script I used for Python/PyFluent:

import ansys.fluent.core as pyfluent
solver = pyfluent.launch_fluent(precision="double", version="3d", mode="solver")
print("Exiting...")
solver.exit()

And I used this Slurm script (a bit simplified compared to the one in the docs):

#!/bin/bash
#SBATCH --job-name="pyfluent-test"
#SBATCH --nodes=1
#SBATCH --ntasks=28
#SBATCH --output="/home/rluciano/python-tests/%x-%j"
#SBATCH --partition=cdc01
#
# Activate your favorite Python environment and load Fluent
#
source /home/rluciano/python-tests/.venv/bin/activate
module load fluent/241weekly
export AWP_ROOT241=/apps/ansys_inc/preview/v241_Certified_Weekly/ansys_inc/v241/
#
# Run a PyFluent script
#
echo "Running PyFluent..."
python /home/rluciano/python-tests/test.py

Do these simple scripts also not work on your end?

dnwillia commented 7 months ago

Yes Fluent surely works. The only change that has to be made to the command that you provided is to use -g parameter in order to work in gui-less environment. fluent 3ddp -t4 -g

Slurm Output:

Creating hostfile... Activating virtual environment... load ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) Exporting AWP_ROOT232... /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -t20 -g /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/cortex/lnamd64/cortex.23.2.0 -f fluent -g (fluent "3ddp -pshmem -host -r23.2.0 -t20 -mpi=intel -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh") Opening input/output transcript to file "/home/t/thomasdn/Aristotle_benchmark/INDIANA_Benchmark_coarse_mesh/fluent-20240124-143418-38647.trn". Auto-Transcript Start Time: 14:34:18, 24 Jan 2024 /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -pshmem -host -t20 -mpi=intel -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh -cx cn41.it.auth.gr:44372:44038 Starting /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/lnamd64/3ddp_host/fluent.23.2.0 host -cx cn41.it.auth.gr:44372:44038 "(list (rpsetvar (QUOTE parallel/function) "fluent 3ddp -flux -node -r23.2.0 -t20 -pshmem -mpi=intel -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "20") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent") (rpsetvar (QUOTE parallel/hostsfile) "") (rpsetvar (QUOTE gpuapp/devices) ""))"

          Welcome to ANSYS Fluent 2023 R2

          Copyright 1987-2023 ANSYS, Inc. All Rights Reserved.
          Unauthorized use, distribution or duplication is prohibited.
          This product is subject to U.S. laws governing export and re-export.
          For full Legal Notice, see documentation.

OK great, so that works at least. Sorry about the -g thing, you are right. As @raph-luc mentions, we have verified today that it's working on our internal SLURM deployment.

christospliakos commented 7 months ago

@raph-luc I tried your simple method and it didn't work. BUT, it worked using this launch option: solver = pyfluent.launch_fluent(precision="double", version="3d", mode="solver", additional_arguments="-g")

It seems that using the "-g" is necessary for our SLURM environment. The slurm output is as follows:

Slurm output:

The following have been reloaded with a version change: 1) gcc/12.2.0 => gcc/12.2.0-fhg4pj2 2) python/3.10.10-lufnu2b => python/3.10.10-abxdifo

Creating hostfile... Activating virtual environment... remove ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) load ansys 2023R2 (PATH LD_LIBRARY_PATH ANSYS_ROOT ANSYSLI_SERVERS ANSYSLMD_LICENSE_FILE CFX5RSH ANSWAIT) Exporting AWP_ROOT232... Running PyFluent... /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -t20 -cnf=cn44:20 -g -sifile=/tmp/serverinfo-b18s2nm4.txt -nm Hostfile does not exist, will try to use it as hostname! ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory Host key verification failed. ssh_askpass: exec(/usr/libexec/openssh/ssh-askpass): No such file or directory Host key verification failed. /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/cortex/lnamd64/cortex.23.2.0 -f fluent -g -sifile=/tmp/serverinfo-b18s2nm4.txt -nm (fluent "3ddp -host -r23.2.0 -t20 -cnf=cn44:20 -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh")

Opening input/output transcript to file "/home/t/thomasdn/Aristotle_benchmark/Coarse_Mesh_PyFluent/fluent-20240125-120344-110158.trn". Auto-Transcript Start Time: 12:03:44, 25 Jan 2024 /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -host -t20 -cnf=cn44:20 -path/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent -ssh -cx cn44.it.auth.gr:36843:38377 Starting /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/lnamd64/3ddp_host/fluent.23.2.0 host -cx cn44.it.auth.gr:36843:38377 "(list (rpsetvar (QUOTE parallel/function) "fluent 3ddp -flux -node -r23.2.0 -t20 -pdefault -mpi=intel -cnf=cn44:20 -ssh") (rpsetvar (QUOTE parallel/rhost) "") (rpsetvar (QUOTE parallel/ruser) "") (rpsetvar (QUOTE parallel/nprocs_string) "20") (rpsetvar (QUOTE parallel/auto-spawn?) #t) (rpsetvar (QUOTE parallel/trace-level) 0) (rpsetvar (QUOTE parallel/remote-shell) 1) (rpsetvar (QUOTE parallel/path) "/mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent") (rpsetvar (QUOTE parallel/hostsfile) "cn44:20") (rpsetvar (QUOTE gpuapp/devices) ""))"

          Welcome to ANSYS Fluent 2023 R2

          Copyright 1987-2023 ANSYS, Inc. All Rights Reserved.
          Unauthorized use, distribution or duplication is prohibited.
          This product is subject to U.S. laws governing export and re-export.
          For full Legal Notice, see documentation.

Build Time: May 29 2023 07:35:15 EDT Build Id: 10212

Connected License Server List: 1055@ansys.it.auth.gr

 --------------------------------------------------------------
 This is an academic version of ANSYS FLUENT. Usage of this product
 license is limited to the terms and conditions specified in your ANSYS
 license form, additional terms section.
 --------------------------------------------------------------

Host spawning Node 0 on machine "cn44.it.auth.gr" (unix). /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/bin/fluent -r23.2.0 3ddp -flux -node -t20 -pdefault -mpi=intel -cnf=cn44:20 -ssh -mport 155.207.96.44:155.207.96.44:36067:0 Starting /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/multiport/mpi/lnamd64/intel2021/bin/mpirun -f /tmp/fluent-appfile.thomasdn.110544 --rsh=ssh -genv FI_PROVIDER tcp -genv FLUENT_ARCH lnamd64 -genv I_MPI_DEBUG 0 -genv I_MPI_ADJUST_GATHERV 3 -genv I_MPI_ADJUST_ALLREDUCE 2 -genv I_MPI_PLATFORM auto -genv PYTHONHOME /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/../../commonfiles/CPython/3_10/linx64/Release/python -genv FLUENT_PROD_DIR /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0 -genv FLUENT_AFFINITY 0 -genv I_MPI_PIN enable -genv KMP_AFFINITY disabled -machinefile /tmp/fluent-appfile.thomasdn.110544 -np 20 /mnt/apps/prebuilt/ansys/2023R2/ansys_inc/v232/fluent/fluent23.2.0/lnamd64/3ddp_node/fluent_mpi.23.2.0 node -mpiw intel -pic default -mport 155.207.96.44:155.207.96.44:36067:0

INFO: Queuing for license HPC_PARALLEL has been initiated. The application will resume once the license is granted. Please use the lmutil lmstat utility to see who is currently using the license.

E0125 12:03:49.433011680 110158 server_chttp2.cc:40] {"created":"@1706177029.432918406","description":"Only 1 addresses added out of total 2 resolved","file":"/home/staff/kkripa/tfsagent/_work/20/b/.conan/data/grpc_base/1.25.0/thirdparty/stable/build/1036933dfdff90461d4bf4154f59e6aa78392d87/grpc/src/core/ext/transport/chttp2/server/chttp2_server.cc","file_line":403,"referenced_errors":[{"created":"@1706177029.432896095","description":"Address family not supported by protocol","errno":97,"file":"/home/staff/kkripa/tfsagent/_work/20/b/.conan/data/grpc_base/1.25.0/thirdparty/stable/build/1036933dfdff90461d4bf4154f59e6aa78392d87/grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":395,"os_error":"Address family not supported by protocol","syscall":"socket","target_address":"[::1]:44683"}]} Information: The server has started and is running. slurmstepd: error: JOB 1756010 ON cn44 CANCELLED AT 2024-01-25T12:04:16

I don't know if you see anything suspicious, but to me it seems that is running normally (there is a shortage in licenses right now. I will try again later.)

I am more interested in running the native GPU solution. This is my next test.

dnwillia commented 7 months ago

Does it still work with -gu? -g is no GUI and graphics, -gu is just no GUI. I thought we were running -gu by default.

christospliakos commented 7 months ago

@dnwillia Yes with -gu it also works.

raph-luc commented 7 months ago

@dnwillia we are using -hidden by default, as it was changed (I think last year) to work on Linux as well. We are using -gu for Linux containers though. In our internal Slurm Linux HPC system, -hidden works fine.

In my understanding, -hidden does have the GUI and the graphics but it's "minimized" or hidden.

@christospliakos Glad to hear you've made some progress, I agree the logs look fine for that test. It makes sense that in some environments -hidden would not work but -gu or -g would.

dnwillia commented 6 months ago

OK I have no idea why we switched to -hidden. At some point it was -gu. @mkundu1 Why did we switch to -hidden?

mkundu1 commented 6 months ago

@dnwillia @raph-luc The -hidden flag is used as default to launch Fluent in both platforms since beginning (PR #7). I think at that time there was some issue in displaying/saving postprocessing images in -gu mode in linux which I cannot reproduce now. We can try to switch the default to -gu now if we don't find any other issue (I'll have a look into this).