Open ooreilly opened 1 year ago
Hi @ooreilly. Internal ticket has been created to investigate your issue. Thanks!
Hi @ooreilly,
I tried running a simple Fortran example with OpenMP offloading and was unable to reproduce the error on omnitrace-instrument v1.11.2, ROCm 6.2.2, and the GNU Fortran compiler. Could you please provide more information so that I may further investigate:
omnitrace-instrument --version
Also, I wanted to confirm if the compiled executable runs as expected without omnitrace? Having this information should allow me to help further, thanks!
Hi @darren-amd,
Thanks for investigating. Please point me to the internal ticket (ping Ossian O'Reilly on teams). 1.
program bandwidth
use iso_c_binding
use omp_lib
implicit none
!$omp requires unified_shared_memory
! Set input array size to be a multiple of the CU count on a single MI250x
integer, parameter :: n = 110 * 10000000, nthreads = 1024
integer :: i, j, num_devices, nteams
double precision :: GB
double precision, allocatable, dimension(:) :: a, b
double precision :: t0, t1, elapsed
allocate(a(n))
allocate(b(n))
GB = 1000**3
call omp_set_default_device(0)
num_devices = omp_get_num_devices()
! Pick a number of teams that is multiple of the CU count
nteams = 110 * 1000
a = 1.0
print *, "n = ", n
print *, "Data size (read and write):", (c_sizeof(a) + c_sizeof(b)) / GB, "GB"
t0 = omp_get_wtime()
!$omp target enter data map(to:a, b)
t1 = omp_get_wtime()
elapsed = t1 - t0
print *, "Initial Map elapsed:", elapsed, " s", " Bandwidth:", ( (c_sizeof(a) + c_sizeof(b)) / GB ) / elapsed, " GB/s"
do i=1,100
t0 = omp_get_wtime()
!$omp target teams distribute parallel do simd num_teams(nteams) thread_limit(nthreads)
do j=1,n
b(j) = a(j)
end do
t1 = omp_get_wtime()
elapsed = t1 - t0
print *, "Elapsed:", elapsed, " s", " Bandwidth:", ( (c_sizeof(a) + c_sizeof(b)) / GB ) / elapsed, " GB/s"
end do
!$omp target update from(a,b)
if (a(n) /= b(n)) then
print *, "Error: a != b!", a(n), b(n)
endif
end program
omnitrace-instrument v1.10.0 (rev: 9de3a6b0b4243bf8ec10164babdd99f64dbc65f2, tag: v1.10.0, compiler: GNU v7.5.0, rocm: v5.3.x)
Yes, the compiled executable runs as expected without omnitrace.
I'm trying omnitrace with OpenMP offloading for a small fortran test code. Depending on which system I tested on I encountered different issues. The test code is compiled using the HPE Cray compiler, CCE 15.0.1.
I either saw:
or:
Any idea what is happening here? Thanks!