firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
651 stars 618 forks source link

Different initialization in impi_intel_linux build target on different machines #12473

Closed johodges closed 7 months ago

johodges commented 7 months ago

Describe the bug The timestep in Verification/Heat_Transfer/back_wall_test_2.fds varies significantly between two machines both built with the impi_intel_linux build target. The os and specific versions of intel oneapi and gcc are slightly different between the two machines:

Machine 1

  1. Centos
  2. ifort --version: ifort (IFORT) 2021.9.0 20230302
  3. gcc --version: gcc (GCC) 12.2.0

Machine 2

  1. Ubuntu
  2. ifort --version: ifort (IFORT) 2021.11.1 20231117
  3. gcc --version: gcc (GCC) 11.4.0

Machine 2 has the intended initial time step of 0.10 seconds; however machine 1 has a significantly lower time step (see below). This case has CFL_MAX set which makes me think there is something odd going on with the velocity initialization which is being picked up with the firstbuild environment but not the second.

Starting FDS ...

MPI Process 0 started on MESH.haifire.com MPI Process 1 started on MESH.haifire.com MPI Process 2 started on MESH.haifire.com

Reading FDS input file ...

Fire Dynamics Simulator

Current Date : February 13, 2024 16:40:14 Revision : FDS-6.8.0-1465-gb39ccd2-jh-firebot Revision Date : Tue Feb 13 09:28:48 2024 -0500 Compiler : Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_000000 Compilation Date : Feb 13, 2024 09:40:15

Number of MPI Processes: 3

MPI version: 3.1 MPI library version: Intel(R) MPI Library 2021.9 for Linux* OS

Job TITLE : Test 1-D heat transfer through rotated GEOM obstruction Job ID string : back_wall_test_2

Time Step: 1, Simulation Time: 0.00009 s Time Step: 2, Simulation Time: 0.0002 s Time Step: 3, Simulation Time: 0.0003 s Time Step: 4, Simulation Time: 0.0004 s Time Step: 5, Simulation Time: 0.0005 s Time Step: 6, Simulation Time: 0.0006 s

marcosvanella commented 7 months ago

Thank you Jon, I'll have a look.

marcosvanella commented 7 months ago

Jon, question: So both ifort and gfortran give a small initial DT in machine 1? I'll try the case in my Centos VM here.

johodges commented 7 months ago

I have not tried the build on machine 1 with ompi_gnu_linux target pulling in the intel mkl library. Let me rebuild and I will report back.

marcosvanella commented 7 months ago

Ok, thanks. Also, what Centos version does Machine 1 have?

johodges commented 7 months ago

Machine 1 is using a custom version of Centos 6. No time step issue on it in ompi_gnu_linux build target:

-bash-4.1$ mpiexec -np 3 ../../Build/ompi_gnu_linux/fds_ompi_gnu_linux back_wall_test_2.fds

Starting FDS ...

MPI Process 0 started on MESH.haifire.com MPI Process 1 started on MESH.haifire.com MPI Process 2 started on MESH.haifire.com

Reading FDS input file ...

Fire Dynamics Simulator

Current Date : February 13, 2024 17:08:36 Revision : FDS-6.8.0-1481-ge85366a-jh-firebot Revision Date : Tue Feb 13 15:36:31 2024 -0500 Compiler : GCC version 12.2.0 Compilation Date : Feb 13, 2024 17:03:19

Number of MPI Processes: 3 Number of OpenMP Threads: 1

MPI version: 3.1 MPI library version: Open MPI v5.0.0rc10, package: Open MPI jhodges@MESH.haifire.com Distribution, ident: 5.0.0rc10, repo rev: v5.0.0rc10, Unreleased developer copy

Job TITLE : Test 1-D heat transfer through rotated GEOM obstruction Job ID string : back_wall_test_2

Time Step: 1, Simulation Time: 0.10 s Time Step: 2, Simulation Time: 0.20 s Time Step: 3, Simulation Time: 0.30 s Time Step: 4, Simulation Time: 0.40 s Time Step: 5, Simulation Time: 0.50 s Time Step: 6, Simulation Time: 0.60 s

marcosvanella commented 7 months ago

Ok thanks. Blaze just run the case (Centos 7) with Version 2021.7 of intel compiler/mpi lib without issues. I have Centos stream 9 in the VM, will try latest oneapi in there.

marcosvanella commented 7 months ago

This is going to be hard to reproduce. Just run the case on Centos 9 with oneapi 2021.11.1 and can't reproduce the problem. Do me a favor, compile the code without the -ipo flag and run the case to see if this flag has anything to do with it. Thanks.

johodges commented 7 months ago

Will do. Which version of gcc are you using? My understanding is the intel target is still pulling some headers and such in from gcc which could cause a difference as well.

marcosvanella commented 7 months ago

The gcc I have here is 11.4. I'm trying to see if I can get oneApi 2021.9. Not clear to me how to get an older version of this.

johodges commented 7 months ago

Good call on the -ipo optimization. Machine 1 impi_intel_linux build target has the right timestamp without -ipo.

marcosvanella commented 7 months ago

Might be a compiler issue. Is this an old machine? Can you install the latest oneapi and see if the issue is still there?

johodges commented 7 months ago

It's an old machine but I will see what I can do.

marcosvanella commented 7 months ago

It's an old machine but I will see what I can do.

Well, the latest OneApi comes with its own challenges. ifort is practically deprecated, you'll see the message at compile time. For ifx you need to swap the -check all flag in the debug target by -check all,nouninit. See: https://community.intel.com/t5/Intel-Fortran-Compiler/Known-bug-with-check-all-or-check-uninit-in-ifx-2024-0-0-for/m-p/1567626/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufExTMFZYSzdRSkk1WVcyfDE1Njc2MjZ8U1VCU0NSSVBUSU9OU3xoSw#M170676

johodges commented 7 months ago

I was able to install the latest oneapi on Machine 1 (ifort --version: ifort (IFORT) 2021.11.1 20231117). Compiling with -ipo on the new compiler the time step looks right.

FYI I still have the offline installers for the older version of oneapi. I can upload them somewhere for you to download and run on your local machine to see if you can reproduce the bug and it points to an initialization error somewhere. If not I think we can close this out as use the latest intel compiler.

marcosvanella commented 7 months ago

Thank you Jon, let's keep this exchange through email from now on. For the time being we'll assume there was an issue on version 2021.9 for -ipo flag and I'll close this.