NorESMhub / BLOM

Bergen Layered Ocean Model
GNU Lesser General Public License v3.0
16 stars 25 forks source link

refacotorization of blom code #317

Closed mvertens closed 5 days ago

mvertens commented 7 months ago

This PR is a large refactorization of the blom code. It would be helpful to go over these changes in a blom meeting. This PR does the following:

1) in mod_remap.F90 defining ATRC will not work since natr is not used inside mod_tracers
#ifdef TRC
      use mod_tracers, only: ntr, itrtke, itrgls
#endif

#ifdef TRC
#  ifdef ATRC
      real, dimension(ntr-natr,1-nbdy:idm+nbdy,1-nbdy:jdm+nbdy) ::
     .  trx,try,trd,ftru,ftrv
      real, dimension(natr,1-nbdy:idm+nbdy,1-nbdy:jdm+nbdy) ::
     .  ag,agx,agy,agd,fagu,fagv
      real xdt,ydt,axxx,ayyy,axxy,axyy,qxx,qyy,qxy,fdt
      integer nt,nat
#  else
      real, dimension(ntr,1-nbdy:idm+nbdy,1-nbdy:jdm+nbdy) ::
     .  trx,try,trd,ftru,ftrv
      integer nt
#  endif
#endif
2) in mode_remap.F90 AND mod_pbcor have the following for uses of mod_tracers
#ifdef TRC
      use mod_tracers, only: ntr, trc, uflxtr, vflxtr
#endif
but throughout the code have the following types of #ifdefs where itrtke and itrgls is never
defined since it needs a use from mod_tracers

#ifdef TRC
                do nt=1,ntr
#  if defined(TKE) && !defined(TKEADV)
                  if (nt.eq.itrtke.or.nt.eq.itrgls) cycle
#  endif
                  uflxtr(nt,i,j)=uflux(i,j)*trc(i-1,j,kn,nt)
                enddo
#endif
TomasTorsvik commented 1 month ago

@mvertens @matsbn I see there are incompatible changes in cesm/thermf_cesm.F -> mod_thermf_cesm.F90 and phy/mod_vcoord.F90. I suppose it can be difficult to do this refactoring while at the same time as code changes are being introduced, but my understanding is that most of the code changes at the moment are related to hybrid coordinates.

Would it be possible to do the refactoring in parts, e.g. starting with the files that have not received any updates since the NorESM2 version? Or would this only make the refactoring process more complicated?

mvertens commented 1 month ago

@TomasTorsvik - its going to be extremely difficult and time consuming to do the refactoring in parts. I'm happy to bring in these changes by hand - that's actually very easy and it only effects a small set of files. I started doing that yesterday - and clearly I must have introduced some incompatibilities in the two files. I still need to verify bfb capability with master. However, changing a few files in the source code to be compatible with master is an order of magnitude easier than trying to do this in pieces.

mvertens commented 1 month ago

@TomasTorsvik - I'll try to fix the incompatibilities in those two files quickly.

mvertens commented 1 month ago

@TomasTorsvik - I have now rebased to master and fixed the conflicts. What is needed again is careful testing that all the results are still bfb. I will do that once we are closer to getting this PR merged.

TomasTorsvik commented 6 days ago

There is a failing CI test macos-latest, mpi=false, openmp=enabled, ecosys=false. I tested the v1.6.1 tag, and the test does not fail on that tag, so it seems the CI framework still works. It's also curious that the other openmp=enabled test with ubuntu-latest go through.

The error message is not very clear to me

[274/306] Compiling Fortran object blom.p/phy_mod_diapfl.F90.o
FAILED: blom.p/phy_mod_diapfl.F90.o blom.p/mod_diapfl.mod 
gfortran-12 -Iblom.p -I. -I.. -Iphy -I../phy -I/opt/homebrew/Cellar/netcdf-fortran/4.6.1/include -I -I/opt/homebrew/Cellar/netcdf/4.9.2_1/include -fdiagnostics-color=always -Wall -Wextra -O2 -g -fdefault-real-8 -fconvert=big-endian -fbacktrace -fcheck=all -finit-real=snan -ffpe-trap=invalid,zero,overflow -DMKS -DTRC -DTKE -DTKEADV -DIDLAGE -DLEVITUS2X -fopenmp -Jblom.p -o blom.p/phy_mod_diapfl.F90.o -c ../phy/mod_diapfl.F90
f951: Warning: Nonexistent include directory '-I/opt/homebrew/Cellar/netcdf/4.9.2_1/include' [-Wmissing-include-dirs]
/var/folders/b3/2xm02wpd21qgrpkck5q1c6k40000gn/T//ccW0lzAM.s:627196:29: error: unexpected token in '.section' directive
        .section .data.rel.ro.local

I did a google search, and the following issue may be relevant https://github.com/iains/gcc-darwin-arm64/issues/107

It's not clear to me at this point if this is an indication of something (potentially) wrong in the code or a problem with macos/gcc-12/openmp, but it might be worth looking a bit closer at the openmp sections of the mod_diapfl code.

mvertens commented 6 days ago

@TomasTorsvik - thanks for looking at this. I've started at this part of the code repeatedly as a result of this failure- and I cannot see anything wrong. There are two main differences between the code that passes and the one that doesn't:

TomasTorsvik commented 6 days ago

@mvertens , @matsbn I found that it is the first block of OpenMP !$omp parallel do private on line 96 that triggers the error. Removing this omp section cause the code to compile. As this is probably not a code error I don't think we should change it. The macos test will probably continue to fail, so I'm tempted to disable it.

mvertens commented 6 days ago

@TomasTorsvik - great work! Thanks for tracking this down. I agree about disabling the test.

mvertens commented 6 days ago

@matsbn - I think I incorporated all the changes you wanted (or tried to). How should we proceed?

TomasTorsvik commented 6 days ago

@mvertens - I made a new issue on the macos build issue (#358 ), and plan to disable the test after this PR has been merged.

mvertens commented 6 days ago

Thank you!

TomasTorsvik commented 5 days ago

@mvertens - Thanks for the work you did on the refactorization! Will you merge this?