JCSDA-internal / ioda-converters

Various converters for getting obs data in and out of IODA
9 stars 4 forks source link

New Memory Error #1366

Closed rmclaren closed 12 months ago

rmclaren commented 12 months ago

Current behavior (describe the bug)

I suddenly get the folloing when tryning to compile the latest from the develop branch:

[ 58%] Linking Fortran executable ../../bin/bufr2nc_fortran.x
final section layout:
    __TEXT/__text addr=0x100006BB0, size=0x000E9B22, fileOffset=0x00006BB0, type=1
    __TEXT/__stubs addr=0x1000F06D2, size=0x00000282, fileOffset=0x000F06D2, type=29
    __TEXT/__const addr=0x1000F0960, size=0x0000D0D4, fileOffset=0x000F0960, type=0
    __TEXT/__cstring addr=0x1000FDA38, size=0x00023AFF, fileOffset=0x000FDA38, type=13
    __TEXT/__eh_frame addr=0x100121538, size=0x0000AAB0, fileOffset=0x00121538, type=19
    __DATA_CONST/__got addr=0x10012C000, size=0x00000AB0, fileOffset=0x0012C000, type=31
    __DATA_CONST/__const addr=0x10012CAC0, size=0x00000790, fileOffset=0x0012CAC0, type=0
    __DATA/__data addr=0x100130000, size=0x00002AD2, fileOffset=0x00130000, type=0
    __DATA/__common addr=0x100132AE0, size=0x0004F6D8, fileOffset=0x00000000, type=26
    __DATA/__bss addr=0x1001821C0, size=0x000D5C04, fileOffset=0x00000000, type=26
    __DATA/__huge addr=0x100257DE0, size=0xB45E5100, fileOffset=0x00000000, type=26
ld: 32-bit RIP relative reference out of range (2664205810 max is +/-2GB): from ___ahi_hsd_mod_MOD_read_hsd 
(0x10001848D) to ___ahi_hsd_mod_MOD_satzen (0x19ED06360) in '___ahi_hsd_mod_MOD_read_hsd' from 
CMakeFiles/bufr2nc_fortran.x.dir/hsd.f90.o for architecture x86_64

Looks like there is something trying to pre-allocate like 3 GB of memory... Might need to use dynamic allocation for this instead in order to reduce the "distance" between the segments of executable code....

Said another way: The memory footprint for the function read_hsd is too large. Its not possible to hold a value large enough in RIP (Relative Instruction Pointer a 32 bit value) to jump to the function satzen.

Expected behavior

Should compile.

Additional information (optional)

rmclaren commented 12 months ago

@PatNichols I think what may have triggered the problem is the addition of the -mcmodel=medium argument in the compiler_flags_GNU_Fortran.cmake file (may be a problem elsewhere as well).

set( CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -g -ffree-line-length-none -mcmodel=medium")

rmclaren commented 12 months ago

For Reference: https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/mcmodel.html

PatNichols commented 12 months ago

Hi @rmclaren , I remember having to add this flags in very well. It is very compiler dependent and OS dependent (using MacOSX vs linux compilers). Taking it out will triggger a linking error on some combinations. 1) I assume you are using the intel compiler from the documentation link. Are using oneapi or classic. it's been using oneapi on our CI so it may be a problem seen only with a particular compiler version/OS combination. 2) Are you using the default compiler flags (it's possible to add in your own compiler flags via ecbuild) and what is the operating system? A key compiler flag might be stack-arrays which implicitly defines -mcmodel=small. Thanks!

PatNichols commented 12 months ago

@rmclaren Are using gnu totally (gcc and g++)?

rmclaren commented 12 months ago

@PatNichols I'm using GNU on MacOSX (installed via macports)..

I see you switch in iodaconv_compiler_flags.cmake and I put a message in there to make sure it's using the correct file:

if( CMAKE_Fortran_COMPILER_ID MATCHES "GNU" )
  if ( APPLE )
    message( STATUS "*************** Using Clang compiler on Apple")
    include( compiler_flags_Clang_GNU_Fortran ) 
  else()

and it is loading the APPLE branch.. So maybe -mcmodel=medium isn't the problem.... Not sure

rmclaren commented 12 months ago

Not good to assume you are using CLANG compiler just cause its an apple machine....

rmclaren commented 12 months ago

I believe So:

-- ---------------------------------------------------------
-- Build summary
-- ---------------------------------------------------------
-- system : [Ronalds-MBP-2] [Darwin-22.6.0] [macosx.64]
-- processor        : [x86_64]
-- endiness         : Little Endian -- IEEE []
-- build type       : [Debug]
-- timestamp        : [20230914110329]
-- install prefix   : [/Users/rmclaren/Work/installs]
--   bin dir        : [/Users/rmclaren/Work/installs/bin]
--   lib dir        : [/Users/rmclaren/Work/installs/lib]
--   include dir    : [/Users/rmclaren/Work/installs/include]
--   data dir       : [/Users/rmclaren/Work/installs/share/iodaconv]
--   cmake dir      : [/Users/rmclaren/Work/installs/lib/cmake/iodaconv]
-- ---------------------------------------------------------
-- C -- GNU 12.3.0
--     compiler   : /opt/local/bin/gcc
--     flags      :  -pipe -O0 -g  
--     link flags : -Wl,-search_paths_first -Wl,-headerpad_max_install_names
-- CXX -- GNU 12.3.0
--     compiler   : /opt/local/bin/g++
--     flags      :  -pipe -std=c++14 -g -Wall -Wno-deprecated-declarations  -O0  
--     link flags : 
-- Fortran -- GNU 12.3.0
--     compiler   : /opt/local/bin/gfortran
--     flags      :  -g -ffree-line-length-none -O0 -fcheck=bounds -ffpe-trap=invalid,zero,overflow,underflow -fbacktrace  
--     link flags : 
-- linker : /opt/local/bin/ld
-- ar     : /opt/local/bin/ar
-- ranlib : /opt/local/bin/ranlib
-- link flags
--     executable [ ]
--     shared lib [ ]
--     static lib [ ]
-- install rpath  : @loader_path/../lib
PatNichols commented 12 months ago

@rmclaren I will issue a PR for a fix. Yes there was an assumption that on mac one would be using a clang/gfortran combination. Ooops ....

rmclaren commented 12 months ago

OK, I don'tr think the issue is with the mcmodel argument. I don't know if there is a different flag you need to set... Issue seems to co-inside with the addition of the hsd.f90 file (8/7/2023). I guess its new...

PatNichols commented 12 months ago

It's the new file in the ncar bufr to ioda update. The bug is basically tied to an optimization that compilers use to only store the lower 32 bits of a pointer address. If you have to use more than that there's a linking error. How that is triggred is a little weird by this file is not something I looked at closely but perhaps I should.