LLNL / Caliper

Caliper is an instrumentation and performance profiling library
http://software.llnl.gov/Caliper/
BSD 3-Clause "New" or "Revised" License
342 stars 63 forks source link

Unable to capture MPI Functions when instrumenting fortran code #230

Open deanchester opened 4 years ago

deanchester commented 4 years ago

I have instrumented a fortran code using Caliper but when I run the application it doesn't capture MPI communication.

I have built Caliper with the following configuration:

cmake -DCMAKE_INSTALL_PREFIX=$HOME/local/caliper-gcc -DCMAKE_C_COMPILER=/csc/tinis/software/Core/GCCcore/7.3.0/bin/gcc -DCMAKE_CXX_COMPILER=/csc/tinis/software/Core/GCCcore/7.3.0/bin/g++ -DWITH_FORTRAN=On -DWITH_TOOLS=On -DWITH_MPI=On -DMPI_C_COMPILER=/csc/tinis/software/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/bin/mpicc -DCMAKE_Fortran_COMPILER=/csc/tinis/software/Core/GCCcore/7.3.0/bin/gfortran ..

When I run my application I set the following in my script:

export CALI_SERVICES_ENABLE=trace,event,mpi,timestamp,recorder
export CALI_TIMER_SNAPSHOT_DURATION=true
export CALI_TIMER_INCLUSIVE_DURATION=true
export CALI_MPI_WHITELIST=all
export CALI_RECORDER_FILENAME="./caliper-$SLURM_JOB_ID/caliper-%mpi.rank%.cali"

In the caliper output files for the code I only have the instrumented areas of the code with the start and end routines:

  call cali_begin_byname('sweep')
  c CODE... 
  call cali_end_byname('sweep')

Any ideas whats going wrong?

daboehme commented 4 years ago

Hi @deanchester ,

By default Caliper relies on library constructors to initialize its MPI component, but sometimes that fails. In that case, you can explicitly initialize it with the cali_mpi_init() function. I've added a Fortran binding for that function in the latest commit to master (#232). With that, adding call cali_mpi_init() somewhere at program start should help. You can set CALI_LOG_VERBOSITY=1 and see if the mpi service gets initialized.

andrewreisner commented 3 years ago

I am having an issue with this as well. Using cali_mpi_init() results in undefined reference to cali_mpi_init_. If I remove the preprocessor guard from https://github.com/LLNL/Caliper/blob/master/src/interface/c_fortran/wrapfcaliper.F#L457 everything works as expected and I get caliper output. Is CALIPER_HAVE_MPI from caliper-config.h supposed to propagate to this wrapper?

daboehme commented 3 years ago

Hi @andrewreisner ,

The CALIPER_HAVE_MPI flag is supposed to propagate to the wrapper, so that might be a bug. I'll take a look. Calling cali_mpi_init() is no longer necessary in newer Caliper versions (as of v2.5.0 at least) though, so you can safely remove it. Capturing MPI functions in Fortran codes is unrelated to that: we simply don't have the wrapper functions for the Fortran MPI functions in Caliper, which are different from the C MPI functions. I hope I can add them at some point, but it's a significant effort.

andrewreisner commented 3 years ago

Thanks for the information. I updated my Caliper version and removed cali_mpi_init() and everything works as expected.

Jiang-Weibo commented 2 months ago

Hi @andrewreisner ,

The CALIPER_HAVE_MPI flag is supposed to propagate to the wrapper, so that might be a bug. I'll take a look. Calling cali_mpi_init() is no longer necessary in newer Caliper versions (as of v2.5.0 at least) though, so you can safely remove it. Capturing MPI functions in Fortran codes is unrelated to that: we simply don't have the wrapper functions for the Fortran MPI functions in Caliper, which are different from the C MPI functions. I hope I can add them at some point, but it's a significant effort.

Hi, I just had the same problem here. I want to intercept MPI functions in Fortran codes, and could you tell me how to write wrapper functions in Caliper for the Fortran MPI functions? Thank you so much.

andrewreisner commented 2 months ago

@Jiang-Weibo I do not have experience with this, but I suspect Caliper already wraps the MPI calls using gotcha and wrapping them yourself is unnecessary.

Jiang-Weibo commented 2 months ago

@andrewreisner Thank you so much for your reply. I tested a simple mpi program in Fortran, and I am sure I installed and enabled mpi services in Caliper, but still I got no correcc response. This is the configuration. bash CALI_SERVICES_ENABLE=aggregate,event,mpi,mpireport,timestamp srun -n 2 ./simple_program This is what I got.

MPI process 0 sends value 12345.
MPI process 1 received value: 12345.
== CALIPER: (1): default: mpireport: MPI is already finalized. Cannot aggregate output.
== CALIPER: (0): default: mpireport: MPI is already finalized. Cannot aggregate output.

There is another issue of Question about MPI_Finalize #535 reporting the same problem. Inspired by that, I simply removed the call of MPI_Finalize in my program, and somehow it works, and I just got the records below.

Path            Min time/rank Max time/rank Avg time/rank Time %    Allocated MB 
mpi-simple-test      0.002451      0.002887      0.002669 17.140302     0.005371 
  mainloop           0.001412      0.001414      0.001413  9.073287     0.001173 
MPI_Comm_dup         0.002300      0.002671      0.002485 15.960977     0.005340 
MPI_Send             0.000084      0.000084      0.000084  0.269365     0.005340 
MPI_Recv             0.000288      0.000288      0.000288  0.926402     0.005340 
MPI_Comm_free        0.000024      0.000028      0.000026  0.165153     0.005340 
MPI_Probe            0.000047      0.000047      0.000047  0.151672     0.005340 
MPI_Get_count        0.000040      0.000040      0.000040  0.126926     0.005340 

I doubt this is because the way I instrument Fortran code or the version of Caliper is not correct. Could you tell me the version of Caliper you are using or how you instrument the Fortran code with Caliper as well as the Caliper configurations? Thank you for your help.

andrewreisner commented 2 months ago

@Jiang-Weibo Try adding mpiflush to your CALI_SERVICES_ENABLE. Otherwise, I would just use the config manager fortran api and call flush before calling MPI_Finalize: https://software.llnl.gov/Caliper/FortranSupport.html#caliper-fortran-api. It has been a couple years since I have used Caliper with fortran, so I do not recall the configuration.

Jiang-Weibo commented 2 months ago

@andrewreisner I have tried both of them, but neither worked. I even changed the version of Caliper for ver. 2.7.0, 2.8.0 and 2.9.0 but they all reported the same error. I guess there is an internal mechanics of Caliper of how MPI_Finalize affect flush. Currently I will just remove MPI_Finalize calls. Thanks for your help again.