Closed stuartarchibald closed 5 years ago
It was originally included in the header on the assumption that e.g. the OpenCL runtime performed this step separate from code-gen, but this is not the case. The CODEGEN
actions should perform all the same optimizations that a separate opt
step would. Do you have an example of linked bitcode which produces bad code when run through just llc
but is correct if first run through opt
?
Seems like this was down to having an implied -O0
optimisation level. Setting this in the options to the llc
call action kind does the inlining required. Thanks.
In fact, seems like I've got the entire Numba ROCm unit test suite passing now. Thanks!
It seems like the action
AMD_COMGR_ACTION_OPTIMIZE_BC_TO_BC
is declared in the header but not implemented in the code? I think this optimisation pass is what permits the inlining of functions declared as havinginternal
linkage, without this, such functions need the preemption specifierdso_local
to compile but I don't think this is valid and ends up leading to invalid direct memory accesses.