FESOM / fesom2

Multi-resolution ocean general circulation model.
http://fesom.de/
GNU General Public License v3.0
47 stars 48 forks source link

FTN compiled stucks at move_alloc #399

Open dsidoren opened 1 year ago

dsidoren commented 1 year ago

https://github.com/FESOM/fesom2/blob/a9f45df14609706602904a4f30dd4d14805e19b5/src/io_netcdf_file_module.F90#L84

@hegish @koldunovn @patrickscholz the ftn 2.7.17 at lumi cannot handle move_alloc. A workaround required urgently!

patrickscholz commented 1 year ago

Could this be an alternative: https://stackoverflow.com/questions/66516244/find-an-alternative-for-norm2-and-move-alloc-in-fortran-95

dsidoren commented 1 year ago

Hi Patrick, indeed I changed the code in the similar manner as you suggested. The problem persists and it turns out that I traced it incorrectly. Everything happens two line above: https://github.com/FESOM/fesom2/blob/a9f45df14609706602904a4f30dd4d14805e19b5/src/io_netcdf_file_module.F90#L82

The code crashes with an error:

[EC_DRHOOK:nid001013:28:1:178507:178507] [20221220:205540:1671562540:8.780] [signal_drhook@ifsaux/support/drhook.c:1988] : [07]: _F90_COPY_POLYMORPHIC /opt/cray/pe/cce/14.0.2/cce/x86_64/lib/libf.so.1 0x151e336e9000 0x151e33709e9e # addr2line [EC_DRHOOK:nid001013:28:1:178507:178507] [20221220:205540:1671562540:8.780] [signal_drhook@ifsaux/support/drhook.c:1988] : [08]: add_dim$io_netcdf_filemodule /pfs/lustrep2/users/sidorenko/RAPS20/flexbuild/bin/../external/cce.lumi/install/lib/libfesom.so 0x151e344aa000 0x151e34a3cafb # a

Cray Compiling Environment (CCE) raises error in _F90_COPY_POLYMORPHIC

Some treatment is required here!

trackow commented 1 year ago

This seems to be related to the fact that the dim_type and var_type contain allocatable types. From a comment in the code, this has apparently been an issue before (albeit with the nvfortran compiler @hegish ) when copying an array where nvfortran lost allocation of those derived types.

This dirty fix with a fixed character length works for me on LUMI, but we should find something better:

diff --git a/src/io_netcdf_file_module.F90 b/src/io_netcdf_file_module.F90
index 9eb6a245..385e0902 100644
--- a/src/io_netcdf_file_module.F90
+++ b/src/io_netcdf_file_module.F90
@@ -5,7 +5,7 @@ module io_netcdf_file_module
   private

   type dim_type
-    character(:), allocatable :: name
+    character(1000) :: name
     integer len
     integer ncid
   end type
@@ -15,7 +15,7 @@ module io_netcdf_file_module
   end type

   type var_type ! todo: use variable type from io_netcdf_module here
-    character(:), allocatable :: name
+    character(1000) :: name
     integer, allocatable :: dim_indices(:)
     integer datatype

Note that doing this only for the dim_type let's the model fail with another F90_COPY_POLYMORPHIC issue:

[EC_DRHOOK:nid001314:2:1:169921:169921] [20221221:011331:1671578011:8.043] [signal_drhook@ifsaux/support/drhook.c:1988] : [07]: _F90_COPY_POLYMORPHIC /opt/cray/pe/cce/14.0.2/cce/x86_64/lib/libf.so.1 0x148603f13000 0x148603f33e9e # addr2line [EC_DRHOOK:nid001314:2:1:169921:169921] [20221221:011331:1671578011:8.043] [signal_drhook@ifsaux/support/drhook.c:1988] : [08]: add_var_x$io_netcdf_filemodule /pfs/lustrep1/projappl/project_462000048/thorackow/RAPS20_CY47R3_nextGEMS/flexbuild/bin/../external/cce.lumi/install/lib/libfesom.so 0x148604cd5000 0x148604eaa50a # addr2line

dsidoren commented 1 year ago

Well spotted Thomas! Thanks! I think that limiting the name by 100 shall be sufficient :)

hegish commented 1 year ago

using a fixed length name will break a lot of thinks, e.g. one would have to use trim everywhere.

JanStreffing commented 2 weeks ago

Things are running on lumi, can be closed?