RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

[Build] Fortran module is not installed #561

Closed ivan-pi closed 10 months ago

ivan-pi commented 10 months ago

Which version do you want to build? Current master branch.

Which architecture do you want to build for?

To Reproduce

Output log

make distclean && make runs normally. However when trying to run make install I see the following:

===> INSTALL fortran interface to /home/ivan/.local/include/
cp: cannot stat 'likwid.mod': No such file or directory

despite, the .mod file being generated:

~/fortran/likwid_test/likwid$ ll *.mod
-rw-rw-r-- 1 ivan ivan 787 Okt 22 12:10 likwid.mod

When I look at the install PREFIX, the Fortran module has not been installed:

~/fortran/likwid_test/likwid$ ll /home/ivan/.local/include/
total 136
drwxr-xr-x 3 ivan ivan   4096 Okt 22 12:13 ./
drwxrwxr-x 9 ivan ivan   4096 Okt 22 12:01 ../
drwxrwxr-x 2 ivan ivan   4096 Mär  8  2023 blis/
-rw-r--r-- 1 root root  14420 Okt 22 12:13 bstrlib.h
-rw-r--r-- 1 root root 101920 Okt 22 12:13 likwid.h
-rw-r--r-- 1 root root   6648 Okt 22 12:13 likwid-marker.h
ivan-pi commented 10 months ago

I attempted to just copy the module file to my desired prefix:

$ cp likwid.mod /home/ivan/.local/include/

If I try linking an application I have symbols missing:

~/fortran/likwid_test/likwid$ gfortran -I/home/ivan/.local/include -O3 -o chaos test/chaos.F90 -L/home/ivan/.local/lib -llikwid -lpthread -lm -DLIKWID_PERFMON
/usr/bin/ld: /tmp/ccnTJvdB.o: in function `MAIN__':
chaos.F90:(.text+0xdc): undefined reference to `likwid_markerstart_'
/usr/bin/ld: chaos.F90:(.text+0xed): undefined reference to `likwid_markerstop_'
/usr/bin/ld: chaos.F90:(.text+0xfe): undefined reference to `likwid_markerstart_'
/usr/bin/ld: chaos.F90:(.text+0x15b): undefined reference to `likwid_markerstop_'
/usr/bin/ld: chaos.F90:(.text+0x16c): undefined reference to `likwid_markerstart_'
/usr/bin/ld: chaos.F90:(.text+0x1f9): undefined reference to `likwid_markerstop_'
/usr/bin/ld: chaos.F90:(.text+0x20a): undefined reference to `likwid_markerstart_'
/usr/bin/ld: chaos.F90:(.text+0x266): undefined reference to `likwid_markerstop_'
collect2: error: ld returned 1 exit status

Indeed, they appear to be missing from the shared library:

~/fortran/likwid_test/likwid$ readelf -Ws --dyn-syms /home/ivan/.local/lib/liblikwid.so | grep likwid_marker
   179: 0000000000225da0  1260 FUNC    GLOBAL DEFAULT   12 likwid_markerInit
   196: 0000000000224f40   871 FUNC    GLOBAL DEFAULT   12 likwid_markerStartRegion
   197: 0000000000232830     9 FUNC    GLOBAL DEFAULT   12 likwid_markerthreadinit_
   213: 0000000000232850     9 FUNC    GLOBAL DEFAULT   12 likwid_markernextgroup_
   216: 0000000000232820     9 FUNC    GLOBAL DEFAULT   12 likwid_markerinit_
   224: 0000000000232920    96 FUNC    GLOBAL DEFAULT   12 likwid_markerstopregion_
   237: 0000000000224d30   325 FUNC    GLOBAL DEFAULT   12 likwid_markerRegisterRegion
   241: 0000000000226290   483 FUNC    GLOBAL DEFAULT   12 likwid_markerThreadInit
   257: 0000000000225980   378 FUNC    GLOBAL DEFAULT   12 likwid_markerGetRegion
   278: 0000000000224480   176 FUNC    GLOBAL DEFAULT   12 likwid_markerNextGroup
   280: 0000000000232840     9 FUNC    GLOBAL DEFAULT   12 likwid_markerclose_
   284: 0000000000232a20    96 FUNC    GLOBAL DEFAULT   12 likwid_markerresetregion_
   287: 0000000000232860    96 FUNC    GLOBAL DEFAULT   12 likwid_markerregisterregion_
   291: 0000000000224530  2044 FUNC    GLOBAL DEFAULT   12 likwid_markerClose
   297: 00000000002328c0    96 FUNC    GLOBAL DEFAULT   12 likwid_markerstartregion_
   312: 00000000002252b0  1729 FUNC    GLOBAL DEFAULT   12 likwid_markerStopRegion
   315: 0000000000232980   150 FUNC    GLOBAL DEFAULT   12 likwid_markergetregion_
   345: 0000000000225b00   443 FUNC    GLOBAL DEFAULT   12 likwid_markerResetRegion
   690: 000000000020fec0    93 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionMetric
   691: 000000000020ff20    93 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionResult
   692: 000000000020ff80    77 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionCount
   693: 000000000020ffd0    74 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionTime
   696: 0000000000210450   350 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionCpulist
   697: 00000000002105b0    46 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionThreads
   698: 00000000002105e0    46 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionEvents
   699: 0000000000210610    46 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionTag
   700: 0000000000210640    49 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerRegionGroup
   701: 0000000000210680    33 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerNumRegions
   702: 00000000002106b0    22 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerFile_destroy
   703: 00000000002106d0    46 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerFile_read
   713: 0000000000210a60    20 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerClose
   714: 0000000000210a80    20 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerNext
   715: 0000000000210aa0    20 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerThreadInit
   716: 0000000000210ac0    20 FUNC    LOCAL  DEFAULT   12 lua_likwid_markerInit
  1797: 0000000000232920    96 FUNC    GLOBAL DEFAULT   12 likwid_markerstopregion_
  1819: 0000000000224d30   325 FUNC    GLOBAL DEFAULT   12 likwid_markerRegisterRegion
  1843: 0000000000232820     9 FUNC    GLOBAL DEFAULT   12 likwid_markerinit_
  1854: 0000000000224480   176 FUNC    GLOBAL DEFAULT   12 likwid_markerNextGroup
  1893: 0000000000232980   150 FUNC    GLOBAL DEFAULT   12 likwid_markergetregion_
  1901: 0000000000224530  2044 FUNC    GLOBAL DEFAULT   12 likwid_markerClose
  1923: 00000000002252b0  1729 FUNC    GLOBAL DEFAULT   12 likwid_markerStopRegion
  1931: 0000000000232a20    96 FUNC    GLOBAL DEFAULT   12 likwid_markerresetregion_
  1935: 0000000000225da0  1260 FUNC    GLOBAL DEFAULT   12 likwid_markerInit
  1951: 0000000000232840     9 FUNC    GLOBAL DEFAULT   12 likwid_markerclose_
  1959: 0000000000224f40   871 FUNC    GLOBAL DEFAULT   12 likwid_markerStartRegion
  1971: 00000000002328c0    96 FUNC    GLOBAL DEFAULT   12 likwid_markerstartregion_
  1981: 0000000000226290   483 FUNC    GLOBAL DEFAULT   12 likwid_markerThreadInit
  1997: 0000000000232830     9 FUNC    GLOBAL DEFAULT   12 likwid_markerthreadinit_
  2005: 0000000000232860    96 FUNC    GLOBAL DEFAULT   12 likwid_markerregisterregion_
  2016: 0000000000225b00   443 FUNC    GLOBAL DEFAULT   12 likwid_markerResetRegion
  2063: 0000000000225980   378 FUNC    GLOBAL DEFAULT   12 likwid_markerGetRegion
  2120: 0000000000232850     9 FUNC    GLOBAL DEFAULT   12 likwid_markernextgroup_
ivan-pi commented 10 months ago

In the build log I see the following:

===>  COMPILE  GCC/likwid_f90_interface.o
===>  COMPILE  GCC/tree.o
===>  COMPILE  GCC/frequency_cpu.o
===>  COMPILE  GCC/loadData.o
===>  ENTER  /home/ivan/fortran/likwid_test/likwid/ext/hwloc
===>  ENTER  /home/ivan/fortran/likwid_test/likwid/ext/lua
===>  CREATE SHARED LIB  liblikwid.so
===>  COMPILE FORTRAN INTERFACE  likwid.mod

Okay, I think I've found the culprit. The example chaos.F90 uses outdated symbols. After appending Region to the subroutine names, I'm able to compile it:

~/fortran/likwid_test/likwid$ gfortran -I./ -O3 -o chaos test/chaos.F90 -L./ -llikwid -lpthread -lm -DLIKWID_PERFMON
~/fortran/likwid_test/likwid$ ./chaos
Running without Marker API. Activate Marker API with -m on commandline.
 job done
TomTheBear commented 10 months ago

Thanks for opening the issue. I experienced the missing installation myself and have it on a note here to fix it before the release.

I will update the chaos.F90 code as well. This is new to me that there are outdated symbols. Especially since the Fortran90 API hasn't changed for ages.

ivan-pi commented 10 months ago

Would the Fortran90 API work with other compilers like nvfortran too? I see you are using the mangling convention of appending an underscore, and passing character length using a "hidden" argument.

Most newer compilers already support bind(c) to use the C linkage conventions, and some even support the Fortran 2018 ISO_Fortran_binding.h header file, meaning your Fortran API could look like this:

subroutine likwid_markerRegisterRegion(regionTag,ierr)
  use, intrinsic :: iso_c_binding, only: c_int, c_char, c_null_char
  character(len=*), intent(in) :: regionTag
  integer, intent(out), optional :: ierr

  interface
    function registerRegion(regionTag) bind(c,name="likwid_markerRegisterRegion")
      import c_int, c_char
      character(kind=c_char) :: regionTag(*)  ! A null-terminated string
      integer(c_int) :: registerRegion
    end function
  end interface

integer :: ierr_

ierr_ = registerRegion(trim(regionTag)//c_null_char)
if (present(ierr)) ierr = ierr_

end subroutine

In Fortran 2023 (still the "future") there is a function named f_c_string(string) which will do the trimming and append the terminating null character for you.

With the ISO Fortran binding, you could also do this:

interface
  subroutine likwid_markerRegisterRegion(regionTag) bind(c,name="likwid_markerregisterregion_")
    character(kind=c_char,len=*), intent(in) :: regionTag
  end subroutine
end interface

but instead of the hidden length argument, use the C descriptor to infer the actual string length. In this case the Fortran compiler will populate a descriptor before making the call to the C routine.

#include <ISO_Fortran_binding.h> // requires F2018

void __attribute__ ((visibility ("default") ))
likwid_markerregisterregion_(const CFI_cdesc_t *regionTag)
{
    int len = regionTag->dim[0].extent;
    char *tmp = (char *) malloc((len+1)*sizeof(char));
    strncpy(tmp, (char *) regionTag->base_addr, len * sizeof(char) );

    for (int i=(len-1); len > 0; len--)
    {
        if (tmp[i] != ' ') {
            tmp[i+1] = 0;
            break;
        }
    }

    likwid_markerRegisterRegion( regionTag );

    free(tmp);
}

Unfortunately, this isn't completely portable in the sense you compile your C part once and use it with any Fortran compiler. The problem is that compilers use different layouts of the CFI_cdesc_t struct and are thereby not compatible.

The downside of these solutions is if you'd like to cater to F77 users. In that case your current procedures are suitable to be used as external subprograms (no type-checking). I believe the two options above require an explicit interface (or module).

TomTheBear commented 10 months ago

In the HPC world, some highly relevant codes are still F77 or F90, that's why we stick to the current approach. I know about the more modern approaches but I never saw the need to the Fortran interface.

Since you seem to be familiar with Fortran and C, especially their interaction, do you see any downsides by keeping the old-style?

ivan-pi commented 10 months ago

The downside is it relies on compiler conventions, whereas with bind(c) you have a standard mechanism. The Cray compiler converts symbol names to uppercase for example, so the likwid Marker API is not applicable. The llvm-flang compiler appears to follow the GCC and ifort convention of appending an underscore: https://godbolt.org/z/YrP6sMfT3. I haven't checked how it passes hidden character length arguments, but probably it also follows the GCC convention.

bind(c) was available in gfortran 4.9 and probably even earlier. Judging by posts on the Intel forum, bind(c) was available in ifort 11 (c. 2010) and earlier. It will be deprecated soon and replaced by ifx. Moreover, the only supported versions of the compiler currently, are 2023.2.1, 2022.3.0, 2021.4.0. So unless you need to support compilers older than a decade, support for C interoperability is not a limiting factor.

F77 or F90 is just a matter of aesthetic, fixed-form vs free-form. Both remain part of the language. There are no true F77 compilers left (even f2c is broken), and one can use F90+ features in .f files. For example F77 specified names can't be longer than 6 characters, which the likwid marker API already violates. Anyone who thinks they are using F77 is probably using features from F90 already.

The "benefit" (but arguably a downside) of your current style is it allows users who don't like Fortran modules or interface blocks to use implicit interfaces:

C implicit.f
C Expects Fortran name mangling; no type checking performed.
C The .mod file is not even needed.
      call likwid_markerInit()
      call likwid_markerRegisterRegion("1")
      call likwid_markerStartRegion("1")
C ... work here ...
      call likwid_markerStopRegion(1)  ! <- mistake 
      call likwid_markerClose()
      end
C explicit.f
      use likwid
C Explicit interfaces and hence type-checking available.
C Procedures could use either Fortran or C linkage conventions.
      call likwid_markerInit()
      call likwid_markerRegisterRegion("example")
      call likwid_markerStartRegion("example")
C ... work here ...
      call likwid_markerStopRegion("example")
      call likwid_markerClose()
      end

Both programs are valid F2018. Procedures using F2003 C-interoperability, or the optional status codes, would require an explicit interface, meaning either they must be imported from a module, or have an interface block. In any case, due to the differences in character-handling you'll need a wrapper, either on the C or Fortran side. So in this sense you are right, the new features don't really bring you much benefit.

Sorry if I'm being over-pedantic or sound like a preacher. 🙉

As long as you only need to support gfortran and ifort/ifx, your existing solution should continue to work fine.

ivan-pi commented 10 months ago

The NAG Fortran Compiler also appears to share the name mangling and character passing convention:

$ cat likwid_example.f
       integer nr_events, cnt
       double precision events, time
       call likwid_markerInit()
       call likwid_markerRegisterRegion("example")
       call likwid_markerStartRegion("example")
C ... do work here ...
       call likwid_markerStopRegion("example")
       call likwid_markerGetRegion("example",nr_events,events,time,cnt)
       call likwid_markerClose()
       end

It actually transpiles to C first (using the -S flag to store the intermediate C output):

int main(int argc,char *argv[]) {
# line 1 "likwid_example.f"
  extern void MAIN(void *);
# line 1 "likwid_example.f"
  __NAGf90_nonsmp_init(MAIN,argc,argv,0,0);
# line 1 "likwid_example.f"
}
void MAIN(void *__NAGf90_main_args)
# line 1 "likwid_example.f"
{
# line 1 "likwid_example.f"
extern void likwid_markergetregion_();
# line 1 "likwid_example.f"
extern void likwid_markerclose_();
# line 1 "likwid_example.f"
static Integer nr_events_;
# line 1 "likwid_example.f"
static Integer cnt_;
# line 1 "likwid_example.f"
static double events_;
# line 1 "likwid_example.f"
static double time_;
# line 1 "likwid_example.f"
extern void likwid_markerinit_();
# line 1 "likwid_example.f"
extern void likwid_markerregisterregion_();
# line 1 "likwid_example.f"
extern void likwid_markerstartregion_();
# line 1 "likwid_example.f"
extern void likwid_markerstopregion_();
# line 3 "likwid_example.f"
likwid_markerinit_();
# line 4 "likwid_example.f"
likwid_markerregisterregion_("example",7);
# line 5 "likwid_example.f"
likwid_markerstartregion_("example",7);
# line 7 "likwid_example.f"
likwid_markerstopregion_("example",7);
# line 8 "likwid_example.f"
likwid_markergetregion_("example",&nr_events_,&events_,&time_,&cnt_,7);
# line 9 "likwid_example.f"
likwid_markerclose_();
# line 10 "likwid_example.f"
__NAGf90_finish(0);
# line 10 "likwid_example.f"
}
TomTheBear commented 10 months ago

I used the MarkerAPI already with Cray compilers but I had to build LIKWID completely with them, not just the client application. Now I know why. Thanks.

So, when I recap your texts: Using the bind(c) method would not break anything important and all compilers support it. The current method works as long as the compilers follow the append-underscore-convention.

I'm in the progress to release LIKWID 5.3. It would be the perfect opportunity to add a bind interface but I don't have to time to write & test it properly. I want to put the release out for SC23 (11.11. - 18.11.2023). If you want to contribute, we could add both variants to 5.3. Depending on your confidence we use either the old or the new variant as the default setting.

I added a PR fixing the installation problem. Do you have time to test it?