ROCm / ROCm-CompilerSupport

The compiler support repository provides various Lightning Compiler related services.
45 stars 31 forks source link

`AMD_COMGR_ACTION_OPTIMIZE_BC_TO_BC` not implemented #9

Closed stuartarchibald closed 5 years ago

stuartarchibald commented 5 years ago

It seems like the action AMD_COMGR_ACTION_OPTIMIZE_BC_TO_BC is declared in the header but not implemented in the code? I think this optimisation pass is what permits the inlining of functions declared as having internal linkage, without this, such functions need the preemption specifier dso_local to compile but I don't think this is valid and ends up leading to invalid direct memory accesses.

scott-linder commented 5 years ago

It was originally included in the header on the assumption that e.g. the OpenCL runtime performed this step separate from code-gen, but this is not the case. The CODEGEN actions should perform all the same optimizations that a separate opt step would. Do you have an example of linked bitcode which produces bad code when run through just llc but is correct if first run through opt?

stuartarchibald commented 5 years ago

Seems like this was down to having an implied -O0 optimisation level. Setting this in the options to the llc call action kind does the inlining required. Thanks.

stuartarchibald commented 5 years ago

In fact, seems like I've got the entire Numba ROCm unit test suite passing now. Thanks!