Closed DJDavies2 closed 1 day ago
These CONTIGUOUS
s were added in the same commit as the other ones. I'll see what Météo-France thinks about having these removed as well.
A bit more info:
The job concerned is a 2PE job where the failure occurs on one of the PEs but not the other (the code then stops in the mpl_wait in dist_spec_control). The PSPECG argument is present on one PE and not present on the other. The PE that crashes is the one where the argument isn't present. The origin of this difference is in trans_distspec (in transi_module.F90) where there are two calls to dist_spec, one which passes PSPECG and the other with doesn't.
Is this for a specific compiler only, or is this generally a problem?
I am seeing failures in different downstream systems: saber (as noted above) and also lfric-lite-jedi. The saber failures are with gnu 9.2. For lfric-lite-jedi the failures occur with gnu 9.2 and gnu 12.2.
It seems to work with gnu 7.3.0. Other systems would take some time to test as they are not fully ported.
This is related to #95.
Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.
@marsdeno please could you liaise with author(s) of the CONTIGUOUS change?
@DJDavies2 do you think this is related to https://github.com/ecmwf-ifs/ectrans/issues/93?
Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.
I guess doing something like
#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS
#endif
and then
REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)
might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.
So a known working compiler is gfortran 7.3. I guess we could disable CONTIGUOUS attribute here for gfortran >= 8 then.
@DJDavies2 do you have a simple reproducer I could try with latest GNU on our machine to see if the issue is still there?
@DJDavies2 do you have a simple reproducer I could try with latest GNU on our machine to see if the issue is still there?
I don't have a simple reproducer I'm afraid. I can try to produce one if you like, but I can't guarantee I will be able to do so.
Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.
I guess doing something like
#ifdef CONTIG_BUGGY_COMPILER #define CONTIG_STATUS ,CONTIGUOUS #else #define CONTIG_STATUS #endif
and then
REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)
might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.
I was thinking more like:
REAL(KIND=JPRB), OPTIONAL, INTENT(IN) :: PSPECG(:,:)
REAL(KIND=JPRB), OPTIONAL, INTENT(IN), CONTIGUOUS :: PSPECG(:,:)
but either way would be fine. The file already ends in .F90 so I think it is already being "pre-processed" even if their aren't any cpp statements at the moment.
@DJDavies2 do you think this is related to #93?
It is a different part of the code and a different compiler so I don't know.how connected these two issues are. It would certainly be helpful if #93 could be handled.
Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.
I guess doing something like
#ifdef CONTIG_BUGGY_COMPILER #define CONTIG_STATUS ,CONTIGUOUS #else #define CONTIG_STATUS #endif
and then
REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)
might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.I was thinking more like:
ifdef CONTIG_BUGGY_COMPILER REAL(KIND=JPRB), OPTIONAL, INTENT(IN) :: PSPECG(:,:) #else REAL(KIND=JPRB), OPTIONAL, INTENT(IN), CONTIGUOUS :: PSPECG(:,:) #endif
but either way would be fine. The file already ends in .F90 so I think it is already being "pre-processed" even if their aren't any cpp statements at the moment.
The advantage of defining something like CONTIG_STATUS is it could be done once, and used wherever needed. In fact it could be done in one of the CMake files.
So what's missing now is a CMake modification to add the preprocessor definition CONTIG_BUGGY_COMPILER
for particular compilers...
So what's missing now is a CMake modification to add the preprocessor definition
CONTIG_BUGGY_COMPILER
for particular compilers...
I wasn't quite sure how to approach this. I don't know exactly which compilers/versions would need this option to be set and it could get messy and error-prone to try and handle it in the cmake file. An alternative would be to just let users set the option themselves via CMAKE_Fortran_FLAGS if they need it. What do you think?
Something roughly like
if( CMAKE_Fortran_COMPILER_ID MATCHES "Intel" )
if( CMAKE_Fortran_COMPILER_VERSION VERSION_LESS_EQUAL 19)
set( COMPILER_HAS_CONTIG_ISSUES True)
endif()
elseif( CMAKE_Fortran_COMPILER_ID MATCHES "GNU" )
if( CMAKE_Fortran_COMPILER_VERSION MATCHES "9.2|12.2")
set( COMPILER_HAS_CONTIG_ISSUES True)
endif()
endif()
in the main CMakeLists.txt
and adding
if( COMPILER_HAS_CONTIG_ISSUES )
target_compile_definitions( trans_${prec} PRIVATE CONTIG_BUGGY_COMPILER)
endif()
after the ecbuild_add_library() call in src/trans/CMakeLists.txt ?
Good @marsdeno except I'd perhaps use the variable ECTRANS_HAVE_CONTIGUOUS_ISSUE
instead.
Also let's make sure to insert a link to this Github issue in comments.
I'd also wrap the first snippet with:
if (NOT DEFINED ECTRANS_HAVE_CONTIGUOUS_ISSUE)
so that it could be overwritten (turn OFF) via a command line argument.
@DJDavies2 I applied my suggestion on top of your latest change in 2fdc7ff
Thanks, I have tested the latest version of this branch and it seems to work for us.
I am not entirely sure what is happening here, but this change does seem to fix the issue at least for the failing job in question. The change just removes the contiguous attribute from one of the arguments (PSPECG). The other argument with the contiguous attribute, PSPEC, seems to be fine, but I don't know why there would be a difference. The only obvious difference is the intent.