FESOM / fesom2

Multi-resolution ocean general circulation model.
http://fesom.de/
GNU General Public License v3.0
45 stars 46 forks source link

RECOM into refactoring #583

Open JanStreffing opened 3 months ago

JanStreffing commented 3 months ago

For now, this is just to get an overview how large the changes for RECOM are, and how far refactoring has moved from the branch to point. I would like to see if RECOM is realistic for a 2.6 merge, or will need to be a 2.7 feature. Note, that from @ogurses side, this branch is not finished or merge ready yet. This is only a draft!

JanStreffing commented 3 months ago

To me this looks doable. The merge conflicts so far are trivial, the programming seems clear, with most of the code in the three new files. I set the target as 2.6 for now. We can always change that later.

ogurses commented 1 month ago

I push the last code pieces into refac_recom repo. REcoM is ready for a merge. Tuning of BGC is on our todo list and will start soon.

JanStreffing commented 1 month ago

Nice job. I tried fixing the merge conflicts. Can you check out and give this a try?

JanStreffing commented 1 month ago

This gcc compiler that we use to the tests is quite strict and it found these:

/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:803:15:

  803 |      CvT(i) = SGLE(gsw_ct_from_t (DBLE(SA(i)), DBLE(T(i)), DBLE(P(i))))
      |               1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:398:20:

  398 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), DBLE(P(i)), DBLE(lon(i)), DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:402:20:

  402 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), 0.0d0, DBLE(lon(i)), DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:410:20:

  410 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), DBLE(P(i)), def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:414:20:

  414 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), 0.0d0, def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:701:15:

  701 |      CvT(i) = SGLE(gsw_ct_from_t (DBLE(SA(i)), DBLE(T(i)), DBLE(P(i))))
      |               1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:602:13:

  602 |      T(i) = SGLE(gsw_t_from_ct (DBLE(SA(i)), DBLE(CvT(i)), DBLE(P(i))))
      |             1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:197:20:

  197 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)),DBLE(P(i)),DBLE(lon(i)),DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:201:20:

  201 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)), 0.0d0, DBLE(lon(i)),DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:209:20:

  209 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)), DBLE(P(i)), def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:213:20:

  213 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)), 0.0d0, def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:512:13:

  512 |      T(i) = SGLE(gsw_t_from_ct (DBLE(SA(i)), DBLE(CvT(i)), DBLE(P(i))))
      |             1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)

The intel compiler would probably gloss over these with default settings. Could you have a look at these, @ogurses?

JanStreffing commented 1 month ago

Adding @chrisdane

ogurses commented 1 month ago

This is from a third party code. Let me check if there are any updates available.

ogurses commented 4 weeks ago

This gcc compiler that we use to the tests is quite strict and it found these:

/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:803:15:

  803 |      CvT(i) = SGLE(gsw_ct_from_t (DBLE(SA(i)), DBLE(T(i)), DBLE(P(i))))
      |               1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:398:20:

  398 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), DBLE(P(i)), DBLE(lon(i)), DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:402:20:

  402 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), 0.0d0, DBLE(lon(i)), DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:410:20:

  410 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), DBLE(P(i)), def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:414:20:

  414 |            SA(i) = SGLE(gsw_sa_from_sp(DBLE(SP(i)), 0.0d0, def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sp' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:701:15:

  701 |      CvT(i) = SGLE(gsw_ct_from_t (DBLE(SA(i)), DBLE(T(i)), DBLE(P(i))))
      |               1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:602:13:

  602 |      T(i) = SGLE(gsw_t_from_ct (DBLE(SA(i)), DBLE(CvT(i)), DBLE(P(i))))
      |             1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:197:20:

  197 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)),DBLE(P(i)),DBLE(lon(i)),DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:201:20:

  201 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)), 0.0d0, DBLE(lon(i)),DBLE(lat(i))))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:209:20:

  209 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)), DBLE(P(i)), def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:213:20:

  213 |            SP(i) = SGLE(gsw_sp_from_sa(DBLE(SA(i)), 0.0d0, def_lon, def_lat))
      |                    1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)
/__w/fesom2/fesom2/src/int_recom/recom/eos.F90:512:13:

  512 |      T(i) = SGLE(gsw_t_from_ct (DBLE(SA(i)), DBLE(CvT(i)), DBLE(P(i))))
      |             1
Error: Type mismatch in argument 'sa' at (1); passed REAL(16) to REAL(8)

The intel compiler would probably gloss over these with default settings. Could you have a look at these, @ogurses?

Is there any way to download merged code? @JanStreffing @patrickscholz @dsidoren

ogurses commented 4 weeks ago

Why gcc is unhappy with the size of the variables? "singledouble.F90, gsm_mod_kinds.F90" are the files where the byte sizes are prescribed. Those are from 2 different model developments. I assume FESOM uses intrinsic ISO_FORTRAN_ENV.

JanStreffing commented 3 weeks ago

From my understanding, you are trying to feed in quadruple precision variables into routines that expect their input to come as double precision. Strangely, the solution seems to be there already, you use DBLE. So now it's complaining that your input to the DBLE routine is not double, which does not make sense. The DBLE routine should have interfaces to deal with all sorts of incoming data types.

pgierz commented 3 weeks ago

@ogurses which gcc version do you have? You could try the wonderful option -fallow-argument-mismatch

pgierz commented 3 weeks ago

Ah...sorry, I didn't read all of the preliminary comments. For downloading merged code, since it isn't merged yet, just download refac_recom that @JanStreffing already updated with his conflict resolutions.

If it's a third-party library, you probably need to write some intermediate interface for it. Something like:

module <third_party_lib_name>_interface
    implicit none

    interface third_party_sgle
        module procedure sgle_quad_precission
    end interface

    contains
    ....

    subroutine sgle_quad_precission
    .... something with type conversion ....
    end subroutine
end module <third_party_lib_name>_interface

My FORTRAN is a bit rusty, but something like that. The Chatbot will be able to help 😉

JanStreffing commented 3 weeks ago

@ogurses which gcc version do you have? You could try the wonderful option -fallow-argument-mismatch

@koldunov: Can we include this flag into our automatic build tests?

pgierz commented 3 weeks ago

@JanStreffing, see https://github.com/FESOM/fesom2/pull/583/commits/dc4d4635cb995b56ee329b941aa1163267d993b3. I think that should include it at the proper place, but I'm not 100% sure...

pgierz commented 3 weeks ago

Mmmmpf. Well, the flag shows up, but the compiler doesn't like it. I'll look into it a bit more carefully after the lunch break.

JanStreffing commented 3 weeks ago

@ogurses Would you be willing and able to come by and have a look at this with @patrickscholz and me sometime e.g. this week? Then we can figure out if this can be fixed in the code, or whether we shall change the testcase compile flag.

ogurses commented 3 weeks ago

@JanStreffing I can come over to building H on Friday. I am still under medication and have to stay in bed due to my pneumonia at least 2 more days

JanStreffing commented 2 weeks ago

The problem seems to be with the GCC version we use to do check runs with. 9.4 is late enough to check for this argument mismatch, but early enough that the flag to turn these into a warning was not yet introduced.

I wonder if we can check with a more modern version.

JanStreffing commented 2 weeks ago

I believe the change has to be made in this docker container: koldunovn/fesom2_test:fesom2.1

Or the container has to be replaced with one that contains a newer gcc compiler version. I'll see if I can manage that, or if I need @koldunovn to help me out.

pgierz commented 2 weeks ago

Changing the Docker recipe is easy, but it looks like it just builds:

https://github.com/FESOM/FESOM2_Docker/blob/d2a2bdaf4d915b92befea428f8a331440a5821ff/fesom2_test/Dockerfile#L12

It might be worthwhile to update the base image to a newer ubuntu than 20.04

https://github.com/FESOM/FESOM2_Docker/blob/d2a2bdaf4d915b92befea428f8a331440a5821ff/fesom2_test/Dockerfile#L1.

I can look into both of those issues and automate building the test container with some regularity (i.e. via a GitHub actions cronjob). At the moment the container is hosted on Dockerhub on @koldunovn's account, but I had at some point figured out how to host them on GitHub as well. I'll make a peep here once it's ready.

pgierz commented 2 weeks ago

OK, the new container now uses GCC 13:

-- The C compiler identification is GNU 13.2.0
-- The CXX compiler identification is GNU 13.2.0

Unfortunately there is still the mismatch error, even if flag mentioned above is showing up...

JanStreffing commented 2 weeks ago

cd /__w/fesom2/fesom2/build/src && /usr/bin/mpifort -DMETISRANDOMSEED=35243 -DMETIS_VERSION=5 -DPARMS -DPART_WEIGHTED -DUSE_PRECISION=2 -D__3Zoo2Det -D__async_icebergs -D__coccos -D__recom -I/usr/include -I/__w/fesom2/fesom2/lib/parms/src/../include -I/__w/fesom2/fesom2/src/async_threads_cpp -I/__w/fesom2/fesom2/build/src/async_threads_cpp -g -O2 -g -ffloat-store -finit-local-zero -finline-functions -fimplicit-none -fdefault-real-8 -ffree-line-length-none -fallow-argument-mismatch -c /__w/fesom2/fesom2/src/int_recom/recom/eos.F90 -o CMakeFiles/fesom.dir/int_recom/recom/eos.F90.o

How strange. We give it the -fallow-argument-mismatch flag but it seems to be ignored. Compiler bug?

pgierz commented 2 weeks ago

Possibly. I'm trying to get the chatbot to write me a minimal example to narrow it down, but it's failing miserably.

pgierz commented 2 weeks ago

@JanStreffing and I narrowed the error down to a type conversion being optimised away. I pulled apart a lot of the code in eos.f90 responsible for this, but someone should please carefully go over the diffs to make sure I did not have a typo somewhere.

We are getting closer, but now get compiler errors further along in recom:

cd /__w/fesom2/fesom2/build/src && /usr/bin/mpifort -DMETISRANDOMSEED=35243 -DMETIS_VERSION=5 -DPARMS -DPART_WEIGHTED -DUSE_PRECISION=2 -D__3Zoo2Det -D__async_icebergs -D__coccos -D__recom -I/usr/include -I/__w/fesom2/fesom2/lib/parms/src/../include -I/__w/fesom2/fesom2/src/async_threads_cpp -I/__w/fesom2/fesom2/build/src/async_threads_cpp -g -O0 -g -ffloat-store -finit-local-zero -finline-functions -fimplicit-none -fdefault-real-8 -ffree-line-length-none -fallow-argument-mismatch -c /__w/fesom2/fesom2/src/int_recom/recom/rho.F90 -o CMakeFiles/fesom.dir/int_recom/recom/rho.F90.o
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:145:21:

  145 |   rhow = 999.842594d0 + 6.793952e-2_r8*X          &
      |                     1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '+' at (1)
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:155:41:

  155 |   rho0 = rhow + A*S + B*S*SQRT(S) + C*S**2.0d0
      |                                         1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '**' at (1)
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:160:49:

  160 |   Ksbmw = 19652.21d0 + 148.4206d0*X - 2.327105d0*X*X &
      |                                                 1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '*' at (1)
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:160:34:

  160 |   Ksbmw = 19652.21d0 + 148.4206d0*X - 2.327105d0*X*X &
      |                                  1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '*' at (1)
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:164:45:

  164 |   Ksbm0 = Ksbmw + S*( 54.6746d0 - 0.603459d0*X + 1.09987e-2_r8*X**2 &
      |                                             1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '*' at (1)
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:170:22:

  170 |        + P*(3.239908d0 + 1.43713e-3_r8*X + 1.16092e-4_r8*X**2 - 5.77905e-7_r8*X**3) &
      |                      1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '+' at (1)
/__w/fesom2/fesom2/src/int_recom/recom/rho.F90:177:19:

  177 |   rho_DNAD = rho0/(1.0d0 - P/Ksbm)
      |                   1
Error: Unexpected derived-type entities in binary intrinsic numeric operator '-' at (1)
pgierz commented 2 weeks ago

Caught it finally :-) -fdefault-real-8 implies promoting double real to 16 bytes unless you also say -fdefault-double-8. We are left with one more error that someone from ReCOM needs to take care of. Maybe @ogurses or @chrisdane can look into it:

/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:190:9:
  190 |     use diff_ver_recom_expl_interface
      |         1
Error: 'diff_ver_recom_expl' of module 'diff_ver_recom_expl_interface', imported at (1), is also the name of the current program unit
/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:289:9:
  289 |     use ver_sinking_recom_interface
      |         1
Error: 'ver_sinking_recom' of module 'ver_sinking_recom_interface', imported at (1), is also the name of the current program unit
/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:446:4:
  446 | if (1) then ! 3rd Order DST Sceheme with flux limiting. This code comes from old recom
      |    1
Error: IF clause at (1) requires a scalar LOGICAL expression
/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:484:4:
  484 | if (0) then ! simple upwind
      |    1
Error: IF clause at (1) requires a scalar LOGICAL expression
/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:734:34:
  734 | subroutine get_seawater_viscosity(tr_num, tracers, partit, mesh)
      |                                  1
Warning: Name 'diff_ver_recom_expl' at (1) is an ambiguous reference to 'diff_ver_recom_expl' from current program unit
/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:734:34:
  734 | subroutine get_seawater_viscosity(tr_num, tracers, partit, mesh)
      |                                  1
Warning: Name 'ver_sinking_recom' at (1) is an ambiguous reference to 'ver_sinking_recom' from current program unit
/__w/fesom2/fesom2/src/int_recom/recom_sinking.F90:734:34:
  734 | subroutine get_seawater_viscosity(tr_num, tracers, partit, mesh)
      |                                  1
Warning: Name 'ver_sinking_recom_benthos' at (1) is an ambiguous reference to 'ver_sinking_recom_benthos' from current program unit
JanStreffing commented 2 weeks ago

Great work @pgierz

pgierz commented 2 weeks ago

OK, last pieces are finished, but there are few questions to check everything. I changed two internal recom things, and since I do not know recom from the inside, I would be happy for a double check: