UO-OACISS / tau2

TAU Performance System Public Mirror (Updated every night at midnight, USA Pacific Time)
http://tau.uoregon.edu
Other
39 stars 16 forks source link

[LLVM] LLVM plug-in fails with OpenMP offloading enabled #12

Open Thyre opened 6 months ago

Thyre commented 6 months ago

While trying to investigate if TAU is also affected by recent issues with CUPTI & OpenMP target offloading, I ran into issues when trying to compile a program with TAU & LLVM when using OpenMP offloading. It seems like TAU is inserting function calls to accelerator functions, which can then not be linked since no accelerator version is provided.

This is simple to reproduce:

  1. Build TAU with the LLVM plugin enabled
  2. Compile this program
    
    #include <stdio.h>
    #include <omp.h>

int main( void ) {

pragma omp target teams num_teams(2)

{
    printf("omp_is_initial_device() = %d | omp_get_team_num() = %d\n", omp_is_initial_device(), omp_get_team_num());
}

}


You will see the following output:

```console
$ taucc -fopenmp --offload-arch=native reproducer.c
Using selective instrumentation for LLVM
__omp_offloading_10303_1f20109_main_l6_debug__
llvm.dbg.declare
__omp_offloading_10303_1f20109_main_l6_debug___omp_outlined_debug__
omp_is_initial_device$ompvariant$S2$s6$Pnohost
__omp_offloading_10303_1f20109_main_l6_debug___omp_outlined
__omp_offloading_10303_1f20109_main_l6
__kmpc_target_init
llvm.nvvm.read.ptx.sreg.tid.x
_ZN4ompx11synchronize14threadsAlignedENS_6atomic10OrderingTyE
llvm.nvvm.read.ptx.sreg.ntid.x
__assert_fail_internal
llvm.assume
llvm.lifetime.start.p0
__kmpc_kernel_parallel
__kmpc_kernel_end_parallel
llvm.lifetime.end.p0
free
vprintf
llvm.trap
llvm.nvvm.barrier0
__kmpc_target_deinit
__llvm_omp_vprintf
__kmpc_global_thread_num
omp_get_team_num
llvm.nvvm.read.ptx.sreg.ctaid.x
llvm.nvvm.read.ptx.sreg.nctaid.x
Tau_start
Tau_stop
Tau_shutdown
Tau_destructor_trigger
main
__omp_offloading_10303_1f20109_main_l6_debug__
__omp_offloading_10303_1f20109_main_l6_debug__.omp_outlined_debug__
llvm.dbg.declare
printf
omp_is_initial_device$ompvariant$S2$s6$Phost
omp_get_team_num
__omp_offloading_10303_1f20109_main_l6_debug__.omp_outlined
__kmpc_global_thread_num
__kmpc_push_num_teams
__kmpc_fork_teams
__omp_offloading_10303_1f20109_main_l6
__tgt_target_kernel
.omp_offloading.requires_reg
__tgt_register_requires
Tau_start
Tau_stop
Tau_shutdown
Tau_destructor_trigger
Tau_init
Tau_set_node
nvlink error   : Undefined reference to 'Tau_start' in '/tmp/asynchronous-nvptx64-nvidia-cuda-sm_75-a04b88.cubin'
nvlink error   : Undefined reference to 'Tau_stop' in '/tmp/asynchronous-nvptx64-nvidia-cuda-sm_75-a04b88.cubin'
clang: error: nvlink command failed with exit code 255 (use -v to see invocation)
clang version 18.1.1 (https://github.com/llvm/llvm-project.git dba2a75e9c7ef81fe84774ba5eee5e67e01d801a)
Target: nvptx64-nvidia-cuda
Thread model: posix
InstalledDir: /opt/apps/software/Core/Compilers/LLVM/18.1.1/bin
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/asynchronous-nvptx64-nvidia-cuda-sm_75-a04b88-9e8ffc.c
clang: note: diagnostic msg: /tmp/asynchronous-nvptx64-nvidia-cuda-sm_75-a04b88-9e8ffc.sh
clang: note: diagnostic msg: 

********************
/opt/apps/software/Core/Compilers/LLVM/18.1.1/bin/clang-linker-wrapper: error: 'clang' failed
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Error: Command(Executable) is -- clang
Error: Full Command attempted is -- clang   reproducer.o -fopenmp --offload-arch=native   -L/opt/apps/software/MPI/OpenMPI/5.0.1/LLVM/18.1.1/lib -Wl,-rpath -Wl,/opt/apps/software/MPI/OpenMPI/5.0.1/LLVM/18.1.1/lib -Wl,--enable-new-dtags -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi -L/home/jreuter/Downloads/tau-2.33.1/x86_64/lib -lTauMpi-clang-ompt-mpi-cupti-openmp -L/opt/apps/software/MPI/OpenMPI/5.0.1/LLVM/18.1.1/lib -L/opt/apps/software/MPI/OpenMPI/5.0.1/LLVM/18.1.1/lib -Wl,-rpath -Wl,/opt/apps/software/MPI/OpenMPI/5.0.1/LLVM/18.1.1/lib -Wl,--enable-new-dtags -lmpi -Wl,-rpath,/opt/apps/software/MPI/OpenMPI/5.0.1/LLVM/18.1.1/lib    -L/home/jreuter/Downloads/tau-2.33.1/x86_64/lib -ltau-clang-ompt-mpi-cupti-openmp              -lbfd          -Wl,--export-dynamic  -lrt   -ldl -Wl,-rpath, -lomp       -L/opt/apps/software/Core/Libraries/CUDA/12.4.0/extras/CUPTI//lib64 -Wl,-rpath,/opt/apps/software/Core/Libraries/CUDA/12.4.0/extras/CUPTI//lib64 -lcupti -lnvidia-ml -L/opt/apps/software/Core/Libraries/CUDA/12.4.0///lib64/stubs -lcuda -L/opt/apps/software/Core/Libraries/CUDA/12.4.0///lib64  -Wl,-rpath,/opt/apps/software/Core/Libraries/CUDA/12.4.0///lib64           -L/opt/apps/software/Core/SWAT/OTF2/3.1-rc3/lib -lotf2 -lotf2 -Wl,-rpath,/opt/apps/software/Core/SWAT/OTF2/3.1-rc3/lib    -ldl -lm -lstdc++ -ldl   -L/home/jreuter/Downloads/tau-2.33.1/x86_64/lib/static-clang-ompt-mpi-cupti-openmp  -fopenmp   -lcudart_static -lpthread    -g  -o a.out
Error: Reverting to a Regular Make
To suppress this message and revert automatically, please add -optRevert to your TAU_OPTIONS environment variable
Press Enter to continue
jordialcaraz commented 6 months ago

Hello Thyre,

I will take a look into this.

However, if you only want to check CUDA and OMPT you don't need the LLVM plug-in.

You can configure TAU and add the flags -cuda=$CUDA_PATH -ompt -cc=nvcc -c++=nvc++ And execute the application with tau_exec -T ompt -cupti -ompt ./myapp

Thyre commented 6 months ago

Thanks for bringing up tau_exec. Haven't thought about that.

I kept -cc=clang -c++=clang++ as I'm especially interested in that case due to https://github.com/llvm/llvm-project/issues/85770.

Trying tau_exec -T ompt -cupti -ompt doesn't end up in a result I expected:

$ clang version 18.1.1 (https://github.com/llvm/llvm-project.git dba2a75e9c7ef81fe84774ba5eee5e67e01d801a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/apps/software/Core/Compilers/LLVM/18.1.1/bin
$ clang -fopenmp --offload-arch=native ./reproducer.c
tau_exec -T ompt -cupti -ompt ./a.out                     
omp_is_initial_device() = 0 | omp_get_team_num() = 1
omp_is_initial_device() = 0 | omp_get_team_num() = 0
/home/jreuter/Downloads/tau-2.33.1/x86_64/bin/tau_exec: line 1515: 2140184 Segmentation fault      $dryrun "$@"

Using the flag -gdb as well, program execution hangs indefinitely. This is the backtrace of all threads:

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
(No debugging symbols found in ./a.out)
Setting environment variable "LD_AUDIT" to null value.
(gdb) run
Starting program: /home/jreuter/Sources/OpenMP/target/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe796f640 (LWP 2140464)]
[New Thread 0x7fffe716e640 (LWP 2140465)]
[New Thread 0x7fffe3fff640 (LWP 2140466)]
[New Thread 0x7fffe37fe640 (LWP 2140467)]
[Thread 0x7fffe3fff640 (LWP 2140466) exited]
[Thread 0x7fffe716e640 (LWP 2140465) exited]
[New Thread 0x7fffe716e640 (LWP 2140468)]
[Thread 0x7fffe716e640 (LWP 2140468) exited]
[New Thread 0x7fffe716e640 (LWP 2140469)]
[New Thread 0x7fffe3fff640 (LWP 2140470)]
omp_is_initial_device() = 0 | omp_get_team_num() = 1
omp_is_initial_device() = 0 | omp_get_team_num() = 0
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
futex_wait (private=0, expected=2, futex_word=0x555555597150) at ../sysdeps/nptl/futex-internal.h:146
146 ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0x555555597150) at ../sysdeps/nptl/futex-internal.h:146
#1  __GI___lll_lock_wait_private (futex=0x555555597150) at ./nptl/lowlevellock.c:34
#2  0x00007ffff7c7fb15 in __GI__IO_fputs (str=0x7fffffff69b0 "2 templated_functions_MULTI_TAUGPU_TIME\n# Name Calls Subrs Excl Incl ProfileCalls", 
    fp=0x5555555bb168) at ./libio/iofputs.c:36
#3  0x00007ffff22df182 in writeHeader (fp=0x5555555bb168, numFunc=2, metricName=0x7fffffff65b0 "templated_functions_MULTI_TAUGPU_TIME") at Profiler.cpp:1240
#4  writeProfile (fp=0x5555555bb168, metricName=0x7fffffff65b0 "templated_functions_MULTI_TAUGPU_TIME", tid=0, metric=0, inFuncs=0x0, numFuncs=0)
    at Profiler.cpp:1623
#5  TauProfiler_writeData (tid=tid@entry=0, prefix=0x7ffff23a7dcf "profile", increment=false, inFuncs=inFuncs@entry=0x0, numFuncs=numFuncs@entry=0)
    at Profiler.cpp:2080
#6  0x00007ffff22dd162 in TauProfiler_DumpData (increment=false, tid=0, prefix=<optimized out>) at Profiler.cpp:1881
#7  TauProfiler_StoreData (tid=tid@entry=0) at Profiler.cpp:1769
#8  0x00007ffff22dcdfa in tau::Profiler::Stop (this=0x555555a65dc0, tid=0, useLastTimeStamp=<optimized out>) at Profiler.cpp:806
#9  0x00007ffff22ea792 in Tau_stop_timer (function_info=<optimized out>, tid=tid@entry=0) at TauCAPI.cpp:847
#10 0x00007ffff22eb565 in Tau_stop_all_timers (tid=tid@entry=0) at TauCAPI.cpp:998
#11 0x00007ffff22f59fa in Tau_profile_exit_threads (begin_index=begin_index@entry=0) at TauCAPI.cpp:1054
#12 0x00007ffff22e86f5 in Tau_profile_exit_all_threads () at TauCAPI.cpp:1073
#13 Tau_destructor_trigger () at TauCAPI.cpp:3192
#14 0x00007ffff22f6a82 in pure_userevent_map_t::~pure_userevent_map_t (this=0x5555555dde10) at TauCAPI.cpp:2088
#15 0x00007ffff7c45d9f in __GI___call_tls_dtors () at ./stdlib/cxa_thread_atexit_impl.c:159
#16 0x00007ffff7c455c9 in __run_exit_handlers (status=0, listp=0x7ffff7e1a838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, 
    run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:46
#17 0x00007ffff7c45610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
#18 0x00007ffff7c29d97 in __libc_start_call_main (main=main@entry=0x5555555551f0 <main>, argc=argc@entry=1, argv=argv@entry=0x7fffffff96c8)
    at ../sysdeps/nptl/libc_start_call_main.h:74
#19 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x5555555551f0 <main>, argc=1, argv=0x7fffffff96c8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffff96b8) at ../csu/libc-start.c:392
#20 0x0000555555555125 in _start ()
(gdb) info threads
  Id   Target Id                                            Frame 
* 1    Thread 0x7ffff7e3ec40 (LWP 2140451) "a.out"          futex_wait (private=0, expected=2, futex_word=0x555555597150)
    at ../sysdeps/nptl/futex-internal.h:146
  2    Thread 0x7fffe796f640 (LWP 2140464) "cuda-EvtHandlr" 0x00007ffff7d18bcf in __GI___poll (fds=0x5555556012a0, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  5    Thread 0x7fffe37fe640 (LWP 2140467) "a.out"          __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, 
    expected=0, futex_word=0x5555560f32f0) at ./nptl/futex-internal.c:57
  7    Thread 0x7fffe716e640 (LWP 2140469) "cuda-EvtHandlr" 0x00007ffff7d18bcf in __GI___poll (fds=0x7fffcc000c20, nfds=11, timeout=100)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  8    Thread 0x7fffe3fff640 (LWP 2140470) "a.out"          __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7fffe3ffed60, op=393, 
    expected=0, futex_word=0x5555556129a0) at ./nptl/futex-internal.c:57
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fffe796f640 (LWP 2140464))]
#0  0x00007ffff7d18bcf in __GI___poll (fds=0x5555556012a0, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7d18bcf in __GI___poll (fds=0x5555556012a0, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fffee0ba9bf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fffee17d6cf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fffee0b58ef in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fffe37fe640 (LWP 2140467))]
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555560f32f0)
    at ./nptl/futex-internal.c:57
57  ./nptl/futex-internal.c: No such file or directory.
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555560f32f0)
    at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x5555560f32f0)
    at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555560f32f0, expected=expected@entry=0, clockid=clockid@entry=0, 
    abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x00007ffff7c9cbdf in do_futex_wait (sem=sem@entry=0x5555560f32f0, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x00007ffff7c9cc78 in __new_sem_wait_slow64 (sem=0x5555560f32f0, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x00007ffff14c6e8f in ?? () from /opt/apps/software/Core/Libraries/CUDA/12.4.0/extras/CUPTI/lib64/libcupti.so.12
#6  0x00007ffff1338bb9 in ?? () from /opt/apps/software/Core/Libraries/CUDA/12.4.0/extras/CUPTI/lib64/libcupti.so.12
#7  0x00007ffff14c2789 in ?? () from /opt/apps/software/Core/Libraries/CUDA/12.4.0/extras/CUPTI/lib64/libcupti.so.12
#8  0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#9  0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 7
[Switching to thread 7 (Thread 0x7fffe716e640 (LWP 2140469))]
#0  0x00007ffff7d18bcf in __GI___poll (fds=0x7fffcc000c20, nfds=11, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:29
29  ../sysdeps/unix/sysv/linux/poll.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7d18bcf in __GI___poll (fds=0x7fffcc000c20, nfds=11, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fffee0ba9bf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007fffee17d6cf in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fffee0b58ef in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) thread 8 
[Switching to thread 8 (Thread 0x7fffe3fff640 (LWP 2140470))]
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7fffe3ffed60, op=393, expected=0, futex_word=0x5555556129a0)
    at ./nptl/futex-internal.c:57
57  ./nptl/futex-internal.c: No such file or directory.
(gdb) bt
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x7fffe3ffed60, op=393, expected=0, futex_word=0x5555556129a0)
    at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x7fffe3ffed60, clockid=0, expected=0, futex_word=0x5555556129a0)
    at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555556129a0, expected=expected@entry=0, clockid=clockid@entry=0, 
    abstime=abstime@entry=0x7fffe3ffed60, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007ffff7c93e9b in __pthread_cond_wait_common (abstime=0x7fffe3ffed60, clockid=0, mutex=0x55555561b670, cond=0x555555612978)
    at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_timedwait64 (cond=0x555555612978, mutex=0x55555561b670, abstime=0x7fffe3ffed60) at ./nptl/pthread_cond_wait.c:652
#5  0x00007fffee01900a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6  0x00007fffee0b58ef in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7  0x00007ffff7c94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#8  0x00007ffff7d26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 

With LLVM 17.0.6, gdb actually returns the segmentation fault with this backtrace:

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
(No debugging symbols found in ./a.out)
Setting environment variable "LD_AUDIT" to null value.
(gdb) run
Starting program: /home/jreuter/Sources/OpenMP/target/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe7d6f640 (LWP 2145982)]
[New Thread 0x7fffe756e640 (LWP 2145983)]
[New Thread 0x7fffe6acc640 (LWP 2145984)]
[New Thread 0x7fffe3eb4640 (LWP 2145985)]
[Thread 0x7fffe6acc640 (LWP 2145984) exited]
[Thread 0x7fffe756e640 (LWP 2145983) exited]
[New Thread 0x7fffe756e640 (LWP 2145986)]
[Thread 0x7fffe756e640 (LWP 2145986) exited]
[New Thread 0x7fffe756e640 (LWP 2145987)]
[New Thread 0x7fffe6acc640 (LWP 2145988)]
omp_is_initial_device() = 0 | omp_get_team_num() = 1
omp_is_initial_device() = 0 | omp_get_team_num() = 0

Thread 1 "a.out" received signal SIGSEGV, Segmentation fault.
__GI__IO_fputs (str=0x7fffffff69c0 "2 templated_functions_MULTI_TAUGPU_TIME\n# Name Calls Subrs Excl Incl ProfileCalls", fp=0x0) at ./libio/iofputs.c:36
36  ./libio/iofputs.c: No such file or directory.
(gdb) bt
#0  __GI__IO_fputs (str=0x7fffffff69c0 "2 templated_functions_MULTI_TAUGPU_TIME\n# Name Calls Subrs Excl Incl ProfileCalls", fp=0x0) at ./libio/iofputs.c:36
#1  0x00007ffff26ad112 in writeHeader (fp=0x0, numFunc=2, metricName=0x7fffffff65c0 "templated_functions_MULTI_TAUGPU_TIME") at Profiler.cpp:1240
#2  writeProfile (fp=0x0, metricName=0x7fffffff65c0 "templated_functions_MULTI_TAUGPU_TIME", tid=0, metric=0, inFuncs=0x0, numFuncs=0) at Profiler.cpp:1623
#3  TauProfiler_writeData (tid=tid@entry=0, prefix=0x7ffff2775dcf "profile", increment=false, inFuncs=inFuncs@entry=0x0, numFuncs=numFuncs@entry=0)
    at Profiler.cpp:2080
#4  0x00007ffff26ab132 in TauProfiler_DumpData (increment=false, tid=0, prefix=<optimized out>) at Profiler.cpp:1881
#5  TauProfiler_StoreData (tid=tid@entry=0) at Profiler.cpp:1769
#6  0x00007ffff26aadca in tau::Profiler::Stop (this=0x555555a699d0, tid=0, useLastTimeStamp=<optimized out>) at Profiler.cpp:806
#7  0x00007ffff26b869b in Tau_stop_timer (function_info=<optimized out>, tid=tid@entry=0) at TauCAPI.cpp:847
#8  0x00007ffff26b9405 in Tau_stop_all_timers (tid=tid@entry=0) at TauCAPI.cpp:998
#9  0x00007ffff26c386a in Tau_profile_exit_threads (begin_index=begin_index@entry=0) at TauCAPI.cpp:1054
#10 0x00007ffff26b6635 in Tau_profile_exit_all_threads () at TauCAPI.cpp:1073
#11 Tau_destructor_trigger () at TauCAPI.cpp:3192
#12 0x00007ffff26c48e2 in pure_userevent_map_t::~pure_userevent_map_t (this=0x5555555e1410) at TauCAPI.cpp:2088
#13 0x00007ffff7c45d9f in __GI___call_tls_dtors () at ./stdlib/cxa_thread_atexit_impl.c:159
#14 0x00007ffff7c455c9 in __run_exit_handlers (status=0, listp=0x7ffff7e1a838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, 
    run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:46
#15 0x00007ffff7c45610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
#16 0x00007ffff7c29d97 in __libc_start_call_main (main=main@entry=0x5555555551f0 <main>, argc=argc@entry=1, argv=argv@entry=0x7fffffff96d8)
    at ../sysdeps/nptl/libc_start_call_main.h:74
#17 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x5555555551f0 <main>, argc=1, argv=0x7fffffff96d8, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffff96c8) at ../csu/libc-start.c:392
#18 0x0000555555555125 in _start ()
(gdb) 
jordialcaraz commented 6 months ago

Hi Thyre,

I tried to reproduce the error but I didn't see the deadlock.

Could you tell us how you compile LLVM and the cuda version you are using?

Also, the full line you use when configuring TAU.

Thanks

Thyre commented 6 months ago

Sure!

I built LLVM 17.0.6 / 18.1.1 with the following CMake command:

      cmake ../llvm -DCMAKE_BUILD_TYPE=Release \
                    -DCMAKE_C_COMPILER=gcc-12 \
                    -DCMAKE_CXX_COMPILER=g++-12 \
                    -DLIBOMPTARGET_ENABLE_DEBUG:Bool=On \
                    -DLLVM_ENABLE_PROJECTS="clang;flang" \
                    -DLLVM_ENABLE_RUNTIMES:STRING="libunwind;libcxxabi;libcxx;compiler-rt;openmp" \
                    -DLLVM_LINK_LLVM_DYLIB:BOOL=On \
                    -DCLANG_LINK_CLANG_DYLIB:BOOL=On \
                    -DLLVM_CCACHE_BUILD=Off  \
                    -DLLVM_ENABLE_ASSERTIONS:BOOL=ON \
                    -DLLVM_PARALLEL_LINK_JOBS=1 \
                    -DLLVM_TARGETS_TO_BUILD="X86;NVPTX;" \
                    -DLLVM_ENABLE_PLUGINS:BOOL=On \
                    -DCMAKE_INSTALL_PREFIX=/opt/apps/software/Core/Compilers/LLVM/{{ llvm_version }}/ \
                    -DLLVM_ENABLE_RTTI:Bool=On

And TAU with the following flags:

./configure -cc=clang -c++=clang++ -fortran=flang-new -mpi -openmp -ompt -cuda=/opt/apps/software/Core/Libraries/CUDA/12.4.0/ -otf=/opt/apps/software/Core/SWAT/OTF2/3.1-rc3 -llvm_cxx=$(which clang++)

My system is running Ubuntu 22.04LTS with CUDA 12.4 installed via the .run file from the NVIDIA site, if that's important.

jordialcaraz commented 6 months ago

I see, something you should take into account, and I believe is the reason of the crash (at least in the last one). If you are not using MPI, do not configure TAU with it, as it expects another process, or MPI_Finalize(), and it may crash.

Compile without the -mpi flag and when executing, add serial to the -T flag, i.e.: tau_exec -T ompt,serial -cupti -ompt ./myapp

Thyre commented 6 months ago

Yup, that seems to work.