CABLE-LSM / CABLE

Home to the CABLE land surface model and its documentation
https://cable.readthedocs.io/en/latest/
Other
8 stars 4 forks source link

Fix gfortran compilation and some warnings #410

Open micaeljtoliveira opened 5 days ago

micaeljtoliveira commented 5 days ago

CABLE

Thank you for submitting a pull request to the CABLE Project.

Description

This PR fixes compilation with gfortran. I also fixes several compiler warnings (mostly about unused variables) in the MPI master/driver modules.

Type of change

Please delete options that are not relevant.

Checklist


📚 Documentation preview 📚: https://cable--410.org.readthedocs.build/en/410/

SeanBryan51 commented 2 days ago

@micaeljtoliveira I was able to build your gcc_compilation_and_warnings branch on Gadi with

./build.bash -C gnu --mpi -DCMAKE_Fortran_FLAGS="-ffree-line-length-none"

However the compiled executable crashes with segmentation fault (for both MPI and serial executables). This is probably outside the scope of this pull request but I will document this here for now.

For the serial case I ran the example serial configuration in the CABLE repository:

cd src/offline/
../../bin/cable cable.nml

which outputs:

 THE NAME LIST IS cable.nml
 Use spatially-specific soil properties;          360         150
           0
           0
 When choosing spatially-specific soil properties,
 snow-free albedo is also overwritten by this data set.
 Reading CABLE_PFTPARM namelist...
 Reading CABLE_SOILPARM namelist...
 Total number of patches (countPatch):            1
 iveg           1           2
 patchfrac           1   1.00000000
  Could neither find restart file ./
  nor ./
  Pre-loaded default initialisations are used.
  Loaded some parameters from met input file: TumbaFluxnet.1.3_met.nc the rest are default values - check log file
 Reading CABLE_PFTPARM namelist...
 Reading CABLE_SOILPARM namelist...

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fc420b985af in ???
#1  0x47e4da in ???
#2  0x40d6f8 in ???
#3  0x40321c in ???
#4  0x7fc420b847e4 in ???
#5  0x40325d in ???
#6  0xffffffffffffffff in ???
Segmentation fault

I was able to trace the segmentation fault to the acc_out_var using the DDT debugger (requires setting CABLE_GNU_Fortran_FLAGS_DEBUG to -g -O0 and building in the debug configuration).

For the MPI case I ran the crujra_accessN96_1h config which crashes similarly to the serial case.

Building your branch with the Intel compiler and running the executable seems to work fine so it looks like this is specific to the gfortran build.

SeanBryan51 commented 2 days ago

@mcuntz I recall I used your compiler flags when adding the GNU compiler. The debug flags currently contain -O (equivalent to -O1) which enables some optimisations and prevents me from using the DDT debugger. Is there a reason why you use -O instead of -O0 in the debug flags for gfortran?

mcuntz commented 2 days ago

@SeanBryan51 -O is there simply for speed even in debug mode. I basically debug with print statements and do not use much the debuggers. No worry to change it.

However, I think that the compilation should work with the pretty pedantic flags. If it works with another compiler means that that compiler assumes something and it is not always what you'd assume if you had coded properly. So I would be 100% for ironing out the places where it is not compiling.

micaeljtoliveira commented 1 day ago

@SeanBryan51 I'm able to reproduce your error. I've compiled cable with the -fbacktrace option (plus a few others), so I get a more useful error message:

At line 2908 of file /home/micael/workspaces/cable/git/src/offline/cable_input.F90
Fortran runtime warning: An array temporary was created
  Loaded some parameters from met input file: TumbaFluxnet.1.3_met.nc the rest are default values - check log file
 Reading CABLE_PFTPARM namelist...
 Reading CABLE_SOILPARM namelist...
At line 2194 of file /home/micael/workspaces/cable/git/src/offline/cable_output.F90
Fortran runtime error: Array bound mismatch for dimension 2 of array 'acc_val' (6/1)

Error termination. Backtrace:
#0  0x4745bc in generate_out_write_acc_d2
        at /home/micael/workspaces/cable/git/src/offline/cable_output.F90:2194
#1  0x477303 in __cable_output_module_MOD_write_output
        at /home/micael/workspaces/cable/git/src/offline/cable_output.F90:1769
#2  0x7ab84b in cable_offline_driver
        at /home/micael/workspaces/cable/git/src/offline/cable_driver.F90:942
#3  0x40379c in main
        at /home/micael/workspaces/cable/git/src/offline/cable_driver.F90:62
micaeljtoliveira commented 1 day ago

Okay, I think I understand the problem, but I'm not sure how you want to fix it.

In the following call in cable_output.F90:

CALL generate_out_write_acc(output%SoilMoist, ovid%SoilMoist, 'SoilMoist', out%SoilMoist, REAL(ssnow%wb, 4), ranges%SoilMoist, patchout%SoilMoistIce, out_settings)

output%SoilMoist is set to false and the out%SoilMoist and ssnow%wb arguments are passed down to the elemental function generate_out_write_acc_d2. At that point, the compiler checks the bounds of those arguments, as it needs to be sure they are compatible, as the function needs to be called element-wise. But output%SoilMoist is actually never allocated (because output%SoilMoist is false) and the check fails.

There are basically two solutions:

  1. Do not check array bounds are run-time.
  2. Exit generate_out_write_acc when argument output_var is false before calling acc_out_var.

I would recommend against option 1., as the array bounds check is actually very useful when debugging.

micaeljtoliveira commented 1 day ago

Actually, even with the checks disabled, it fails (as I should have realized from the segfault message obtained by @SeanBryan51).

This means one really cannot call an elemental function over an array that has not been allocated, which makes total sense. I would thus argue that in this case gfortran is actually doing the right thing and it's the Intel compiler that is misbehaving.

micaeljtoliveira commented 1 day ago

There's a few more problematic cases in the same module were several non-allocated arrays are being accessed. For example, here

CALL generate_out_write_acc(output%casa, ovid%PlantTurnover, 'PlantTurnover', out%PlantTurnover, &
                                REAL((SUM(casaflux%Cplant_turnover, 2))/86400.0/1.201E-5, 4), ranges%NEE, patchout%PlantTurnover, out_settings)

it might happen that casaflux%Cplant_turnover is not allocated, causing a segfault.

I think the only sensible solution is to fence all calls to generate_out_write_acc with an if statement. Note that this issue was introduced in #287. @abhaasgoyal FYI.

SeanBryan51 commented 1 day ago

@micaeljtoliveira thanks for looking into it.

I think the only sensible solution is to fence all calls to generate_out_write_acc with an if statement.

This seems appropriate to me. I'm happy to let you decide if you want to fix this in this pull request.