Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[amdgpu] const globals incorrectly marked addrspace(4) #51478

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR52511
Status NEW
Importance P enhancement
Reported by Jon Chesterfield (jonathanchesterfield@gmail.com)
Reported on 2021-11-15 08:20:02 -0800
Last modified on 2021-11-15 08:49:25 -0800
Version unspecified
Hardware PC Linux
CC jdoerfert@anl.gov, llvm-bugs@lists.llvm.org, Matthew.Arsenault@amd.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Given

#include <complex>
#include <cmath>
#include <omp.h>

#pragma omp declare target

static const double smallx = 1.0e-5;

static const double log_smallx = log(smallx);

#pragma omp end declare target

clang++ -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa   math_exp.cpp -o
math_exp  -lm -save-temps

unhandled address space
UNREACHABLE executed at /home/amd/llvm-
project/llvm/lib/Target/AMDGPU/SIISelLowering.cpp:8869!
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash
backtrace.

IR emitted from clang with -disable-llvm-passes contains a global constructor
to do the log call,

define internal void @__omp_offloading__10303_4068b_log_smallx_l9_ctor() #4 {
entry:
  %call = call double @"_ZL28log$ompvariant$S2$s7$Pamdgcnd"(double 1.000000e-05) #17
  store double %call, double addrspace(4)* @_ZL10log_smallx, align 8, !tbaa !9
  ret void
}

The backend (probably rightly) chokes on that, addrspace(4) is constant in the
sense that the value has to be set before kernel execution begins.

The optimal codegen, I think, would look like a global ctor that writes to that
value as an addrspace(1) value, then the other kernels load it as an
addrspace(4) value. Thus we can use scalar loads on it, after the global ctor
ran to initialise it.

A less clever way to fix this ICE is to find the point in clang that tags the
const variable as addrspace(4) and back it off to only do so when the
initialiser is already available as a constant.

I think this is an artefact of how constexpr is handled by target regions - I
suspect log is marked constexpr in a glibc header, but it isn't actually
constexpr in the hip math headers - but have not confirmed that theory.
Quuxplusone commented 3 years ago
Doesn't need libm headers to reproduce,

#pragma omp declare target
double func(double);
const double glob = func(4.2);
#pragma omp end declare target

clang++ -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa math_exp.cpp -o
math_exp  -lm -save-temps

define internal void @__omp_offloading__10303_4068b_glob_l11_ctor() #0 {
entry:
  %call = call double @_Z4funcd(double 4.200000e+00) #10
  store double %call, double addrspace(4)* @_ZL4glob, align 8, !tbaa !8
  ret void
}

Changing the invocation to c++ changes codegen,

clang++ -std=c++14 -Wall -Wextra -emit-llvm -O2 -ffreestanding -fno-exceptions -
-target=amdgcn-amd-amdhsa -march=gfx906 -mcpu=gfx906 -Xclang -fconvergent-
functions -nogpulib math_exp.cpp -O3 -c -save-temps

; cast away the addrspace before the store
define internal void @__cxx_global_var_init() #0 {
entry:
  %call = call double @_Z4funcd(double 4.200000e+00) #3
  store double %call, double* addrspacecast (double addrspace(4)* @_ZL4glob to double*), align 8, !tbaa !3
  %0 = call {}* @llvm.invariant.start.p0i8(i64 8, i8* addrspacecast (i8 addrspace(4)* bitcast (double addrspace(4)* @_ZL4glob to i8 addrspace(4)*) to i8*))
  ret void
}

This is then optimised to IR that discards the result entirely

define internal void @_GLOBAL__sub_I_math_exp.cpp() #2 {
entry:
  %call.i = tail call double @_Z4funcd(double 4.200000e+00) #3
  %0 = tail call {}* @llvm.invariant.start.p0i8(i64 8, i8* addrspacecast (i8 addrspace(4)* bitcast (double addrspace(4)* @_ZL4glob to i8 addrspace(4)*) to i8*)) #4
  ret void
}

That is to say, C++ looks broken in a slightly different way to OpenMP C++.
 Glibc / constexpr doesn't seem to be involved.