bsc-performance-tools / extrae

Instrumentation framework to generate execution traces of the most used parallel runtimes.
https://tools.bsc.es/extrae
GNU Lesser General Public License v2.1
62 stars 38 forks source link

Identifying subroutines with too many arguments for Extrae #76

Closed samhatfield closed 1 year ago

samhatfield commented 1 year ago

Hi there. I am attempting to put the IFS model through Extrae, which is an enormous Fortran code base of a few million lines. I am seeing this error

Extrae: Error! Can't retrieve handler to stub '__kmpc_parallel_sched_278_args' (278 arguments)! Quitting!
Extrae:        Recompile Extrae to support this number of arguments!
Extrae:        Use src/tracer/wrappers/OMP/genstubs-kmpc-11.sh to do so.

and wanted to ask about how to interpret it. Does it literally mean that Extrae has encountered a Fortran subroutine with 278 arguments, and that this is above the maximum allowed number which is set during compilation?

There are some subroutines with this many arguments and many more in the IFS (I know...) but it is actually pretty difficult to track them down given the size of the code. If there is a way to get information on the subroutine name from Extrae, that would be very useful.

gllort commented 1 year ago

Hi! This is related to the instrumentation of the Intel's OpenMP runtime. To capture the execution of parallel code dispatched through __kmpc_fork_call, we require a wrapper function with the same number of parameters as the original. Since the number of parameters can vary, we internally generate multiple wrapper functions, each with a higher number of parameters, and dynamically select the appropriate function by counting the arguments in varargs. By default, we generate a limit of 256 wrapper functions to capture routines with 0 to 256 arguments. However, IFS has a routine with 278 arguments and potentially more. To increase the number of wrapper functions, follow these steps:

  1. Navigate to 'src/tracer/wrappers/OMP' in the Extrae sources.
  2. Edit the 'genstubs-kmpc-11.sh' script.
  3. Increase the value of MAX_WRAPPERS to 512 at the beginning of the file.
  4. Run the script with './genstubs-kmpc-11.sh'
  5. Finally, reconfigure and rebuild Extrae.

Please note that increasing the number of wrapper functions will significantly increase compile time. Hope this helps!

samhatfield commented 1 year ago

Thanks Germán! I have tried running the script and can see it takes about 40 minutes with 512 as the upper limit. I will try rebuilding Extrae with these wrappers and see how it goes.

samhatfield commented 1 year ago

I think this is unrelated to the above, but when I tried using my self-built Extrae I got this error on exiting the program (otherwise normally)

Attempting to use an MPI routine after finalizing MPICH

As far as I can tell, this is indeed emitted after I've called MPI_FINALIZE. In fact I don't think there are any subroutine or function calls at all after MPI_FINALIZE.

I'm actually using ParastationMPI but this is a fork of MPICH. Have you any ideas what is happening here?

gllort commented 1 year ago

In the past, we have encountered similar problems with ParaStationMPI. To the best of my knowledge, it is possible that ParaStationMPI may be using a library destructor that runs prior to the destructor of our tracing library. This premature finalization of MPI disrupts the tracing wrap-up process. To address this issue, try setting the following environment variable in your job script: "export EXTRAE_SKIP_AUTO_LIBRARY_INITIALIZE=1". Does this solve the problem?

samhatfield commented 1 year ago

Thanks - I tried that but it gave a segfault on the first OpenMP statement (basically !$ N_OML_MAX_THREADS = OMP_GET_MAX_THREADS()). Not sure what's going on there.

If your theory is right, maybe I could get around this by manually calling EXTRAE_FINI at the very end of my program. I'll try that.

gllort commented 1 year ago

Another possible solution is to disable the 'merge' option in the extrae.xml. This initiates the final merging step that generates the Paraver trace, which might be the source of the late MPI calls. To complete this step manually, run the following command: $EXTRAE_HOME/bin/mpi2prv -f TRACE.mpits -o

As for the segfault related to OpenMP, I don't know either what's going on there. Which version of the Intel compiler you are currently using?

samhatfield commented 1 year ago

Yes good point on the merging. I usually do that manually but wasn't in this case. It doesn't seem to have resolved the problem but I will continue merging manually anyway to eliminate it as the cause.

I'm using Intel/2021.4.0. It's very odd but by accident I've found a setup which sort of works. I have to link in the Extrae profiling .a library statically to my executable and put the .so on the LD_PRELOAD path. With this the application finishes without error and produces the correct number of .mpit files in set-0. But even then the TRACE.mpits file only lists some of the files. I have to complete the list manually before merging. And even then, all MPI tracing information is missing! There are no entries beginning with 3 in the trace file, even though mpi enabled=yes in extrae.xml. Baffling. The rest of the trace seems to be there and looks correct in Paraver. I will keep looking into it.

This is sort of beyond the scope of the original issue which is now resolved, so feel free to close this.

gllort commented 1 year ago

Is it possible that the libmpi library is being statically linked before the tracing library? If this is the case, any calls to MPI will be resolved against libmpi, preventing the tracing library from intercepting these symbols. If you need to statically link the tracing library, ensure that it appears before libmpi in the link command. Have you been able to retrieve the MPI information?

gllort commented 1 year ago

Solved increasing the value of MAX_WRAPPERS to 512