ecmwf-ifs / ectrans

Global spherical harmonics transforms library underpinning the IFS
Apache License 2.0
15 stars 30 forks source link

Remove CONTIGUOUS from dist_spec_control_argument #98

Closed DJDavies2 closed 1 day ago

DJDavies2 commented 2 months ago

I am not entirely sure what is happening here, but this change does seem to fix the issue at least for the failing job in question. The change just removes the contiguous attribute from one of the arguments (PSPECG). The other argument with the contiguous attribute, PSPEC, seems to be fine, but I don't know why there would be a difference. The only obvious difference is the intent.

samhatfield commented 2 months ago

These CONTIGUOUSs were added in the same commit as the other ones. I'll see what Météo-France thinks about having these removed as well.

DJDavies2 commented 2 months ago

A bit more info:

The job concerned is a 2PE job where the failure occurs on one of the PEs but not the other (the code then stops in the mpl_wait in dist_spec_control). The PSPECG argument is present on one PE and not present on the other. The PE that crashes is the one where the argument isn't present. The origin of this difference is in trans_distspec (in transi_module.F90) where there are two calls to dist_spec, one which passes PSPECG and the other with doesn't.

wdeconinck commented 1 month ago

Is this for a specific compiler only, or is this generally a problem?

DJDavies2 commented 1 month ago

I am seeing failures in different downstream systems: saber (as noted above) and also lfric-lite-jedi. The saber failures are with gnu 9.2. For lfric-lite-jedi the failures occur with gnu 9.2 and gnu 12.2.

It seems to work with gnu 7.3.0. Other systems would take some time to test as they are not fully ported.

DJDavies2 commented 1 month ago

This is related to #95.

DJDavies2 commented 1 month ago

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

wdeconinck commented 1 month ago

@marsdeno please could you liaise with author(s) of the CONTIGUOUS change?

samhatfield commented 1 month ago

@DJDavies2 do you think this is related to https://github.com/ecmwf-ifs/ectrans/issues/93?

marsdeno commented 1 month ago

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

I guess doing something like

#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS 
#endif

and then

REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)

might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.

wdeconinck commented 1 month ago

So a known working compiler is gfortran 7.3. I guess we could disable CONTIGUOUS attribute here for gfortran >= 8 then.

marsdeno commented 1 month ago

@DJDavies2 do you have a simple reproducer I could try with latest GNU on our machine to see if the issue is still there?

DJDavies2 commented 1 month ago

@DJDavies2 do you have a simple reproducer I could try with latest GNU on our machine to see if the issue is still there?

I don't have a simple reproducer I'm afraid. I can try to produce one if you like, but I can't guarantee I will be able to do so.

DJDavies2 commented 1 month ago

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

I guess doing something like

#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS 
#endif

and then

REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)

might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.

I was thinking more like:

ifdef CONTIG_BUGGY_COMPILER

REAL(KIND=JPRB), OPTIONAL, INTENT(IN) :: PSPECG(:,:)

else

REAL(KIND=JPRB), OPTIONAL, INTENT(IN), CONTIGUOUS :: PSPECG(:,:)

endif

but either way would be fine. The file already ends in .F90 so I think it is already being "pre-processed" even if their aren't any cpp statements at the moment.

DJDavies2 commented 1 month ago

@DJDavies2 do you think this is related to #93?

It is a different part of the code and a different compiler so I don't know.how connected these two issues are. It would certainly be helpful if #93 could be handled.

marsdeno commented 1 month ago

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

I guess doing something like

#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS 
#endif

and then REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:) might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.

I was thinking more like:

ifdef CONTIG_BUGGY_COMPILER REAL(KIND=JPRB), OPTIONAL, INTENT(IN) :: PSPECG(:,:) #else REAL(KIND=JPRB), OPTIONAL, INTENT(IN), CONTIGUOUS :: PSPECG(:,:) #endif

but either way would be fine. The file already ends in .F90 so I think it is already being "pre-processed" even if their aren't any cpp statements at the moment.

The advantage of defining something like CONTIG_STATUS is it could be done once, and used wherever needed. In fact it could be done in one of the CMake files.

wdeconinck commented 1 week ago

So what's missing now is a CMake modification to add the preprocessor definition CONTIG_BUGGY_COMPILER for particular compilers...

DJDavies2 commented 1 week ago

So what's missing now is a CMake modification to add the preprocessor definition CONTIG_BUGGY_COMPILER for particular compilers...

I wasn't quite sure how to approach this. I don't know exactly which compilers/versions would need this option to be set and it could get messy and error-prone to try and handle it in the cmake file. An alternative would be to just let users set the option themselves via CMAKE_Fortran_FLAGS if they need it. What do you think?

marsdeno commented 1 week ago

Something roughly like

if( CMAKE_Fortran_COMPILER_ID MATCHES "Intel"  )
  if( CMAKE_Fortran_COMPILER_VERSION VERSION_LESS_EQUAL 19)
    set( COMPILER_HAS_CONTIG_ISSUES True)
  endif()
elseif( CMAKE_Fortran_COMPILER_ID MATCHES "GNU"  )
  if( CMAKE_Fortran_COMPILER_VERSION MATCHES "9.2|12.2")
    set( COMPILER_HAS_CONTIG_ISSUES True)
  endif()
endif()

in the main CMakeLists.txt

and adding

if( COMPILER_HAS_CONTIG_ISSUES ) 
  target_compile_definitions( trans_${prec} PRIVATE CONTIG_BUGGY_COMPILER) 
endif()

after the ecbuild_add_library() call in src/trans/CMakeLists.txt ?

wdeconinck commented 5 days ago

Good @marsdeno except I'd perhaps use the variable ECTRANS_HAVE_CONTIGUOUS_ISSUE instead. Also let's make sure to insert a link to this Github issue in comments. I'd also wrap the first snippet with: if (NOT DEFINED ECTRANS_HAVE_CONTIGUOUS_ISSUE) so that it could be overwritten (turn OFF) via a command line argument.

wdeconinck commented 5 days ago

@DJDavies2 I applied my suggestion on top of your latest change in 2fdc7ff

DJDavies2 commented 4 days ago

Thanks, I have tested the latest version of this branch and it seems to work for us.