Open preda opened 3 years ago
Thanks @preda for reaching out. I will pass this information to compiler team and reach them for the inputs. Thank you.
clang/LLVM changes to enable this are now upstream. The compiler options -mno-amdgpu-ieee and -fno-honor-nans are both required to enable such folding.
clang/LLVM changes to enable this are now upstream. The compiler options -mno-amdgpu-ieee and -fno-honor-nans are both required to enable such folding.
@b-sumner This is great news, thank you. What is the way to enable this with OpenCL? -- will OpenCL accept those flags directly, or some other OpenCL flags that will be translated to clang -mno-amdgpu-ieee and -fno-honor-nans?
see also #967 related.
I'm coming back with this request:
Offer a way to enable the GCN Output Modifiers ("OMOD") such as MUL:2 in OpenCL.
Now that LLVM supports generating GCN OMODs (as per @b-sumner 's comment above), how can we make use of that in OpenCL?
@preda Hi, is this issue still persisting on the latest version of ROCm? If not can we close this ticket?
I'm using OpenCL, and AFAIK there is still no way to take advantage of VOP3 modifiers such as MUL:2 in the OpenCL compilation.
OTOH according to @b-sumner 's comment above, the issue should be fixed for clang/llvm by using -mno-amdgpu-ieee and -fno-honor-nans .
But there does not seem to be a way to pass those compiler flags from OpenCL, so as far as this issue is concerned (note "OpenCL performance" in the issue title), this is not fixed.
@preda I will check with the internal team and get back to you. Thanks!
A function such as: double sum2(double x, double y) { return 2 * (x + y); } could be compiled to a single VOP3 GCN instructions such as:
But this efficient code is not generated because MUL:2 and the like only function correctly with denormals disabled and non-IEEE mode. (denormals and IEEE mode can be set thus:
Feature request: please provide an OpenCL compilation flag that enables MUL:2 and the like, at the same time disabling denormals and IEEE mode as required. This would allow a developer to choose between the two good things: denormals on one side, and more performance on the other side (by making better use of the power of VOP3 instructions).