Open karolherbst opened 1 year ago
Hi, it looks like something odd is going on here.
To make sure I'm looking in the right place, I believe this is the relevant code:
This is essentially create an OpenCL C kernel to compare against. The OpenCL C kernel will have the form:
#define spirv_fadd(a, b) (a) + (b)
#define spirv_fsub(a, b) (a) - (b)
#define spirv_fmul(a, b) (a) * (b)
#define spirv_fdiv(a, b) (a) / (b)
#define spirv_frem(a, b) fmod(a, b)
#define spirv_fmod(a, b) copysign(fmod(a,b),b)
#define T float
#define FUNC spirv_frem
__kernel void fmath_cl(__global T *out,
const __global T *lhs, const __global T *rhs)
{
int id = get_global_id(0);
out[id] = FUNC(lhs[id], rhs[id]);
}
The define for spirv_frem
does look a little strange, but because there is no frem
OpenCL C built-in function to map to we need to do something special to handle this case.
Note that the SPIR-V description for OpFRem is:
https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpFRem
The floating-point remainder whose sign matches the sign of Operand 1.
Does this mean that we have the spirv_frem
and spirv_fmod
cases backwards? In other words, should the kernel instead have the form:
...
#define spirv_fmod(a, b) fmod(a, b)
#define spirv_frem(a, b) copysign(fmod(a,b),b)
This seems to work fine for our implementation, although we didn't have an issue with the defines the other way around either so I'm not sure how much this tells us.
This rings a bell!
See KhronosGroup/SPIRV-LLVM-Translator#296.
It is correct to implement OpFRem as per the OpenCL C fmod builtin. The name of this builtin originally comes from SVr4 / 4.3BSD – I'm not sure of its original semantics, but in C90/C99, it is described as a remainder operation returning the sign of the first operand, i.e. the dividend – this means that fmod(x, y) matches trunc(x / y), i.e. abs(x - (fmod(x, y) + trunc(x / y))) < ε, and in the OpenCL C specification it is described, equivalently, as returning x - y * trunc(x / y).
OpFMod is quite different to the OpenCL C fmod builtin. I presume the name for this comes from OpenGL, but it is described as a remainder operation returning the sign of the second operand, i.e. the divisor. There is an implementation of this in terms of OpFRem in the SPIR-V/LLVM Translator, added in the above-mentioned PR.
The remainder builtin is yet another thing, with semantics such that essentially remainder(x, y) matches round(x / y), i.e. abs(x - (remainder(x, y) + round(x / y))) < ε. This is quite different to OpFMod and it is also quite different to OpFRem.
copysign(fmod(x, y), y) is quite simply wrong.
The testing passes because the input values it uses are generated with _genrandreal1, which produces values in the interval [0, 1], i.e. it does not produce negative test values for either the dividend or the divisor.
My point is we can't use OpenCL C builtins at all, because SPIR-V instruction don't provide the necessary precision. So even if the formula is correct, implementations are free to implement OpFRem
and OpFMod
with lower precision.
And yeah.. it passes with some random seeds and does not with some special ones. I got it to pass just running the subtests, but running all tests makes them fail.
FWIW, I think there are several SPIR-V instructions that are relevant to this issue:
op_frem_float_regular
.op_fmod_float_regular
.Of these instructions, only (3) and (4) have defined accuracy in the OpenCL SPIR-V Environment Spec. That seems like a problem, and we should define the expected accuracy for (1) and (2) as well. @karolherbst is this what you mean by "we can't use OpenCL C builtins at all, because SPIR-V instruction don't provide the necessary precision"?
Note that the required accuracy for the OpenCL C fmod
and remainder
built-in functions is 0 ULP so if we do define the accuracy of the SPIR-V OpFMod and OpFRem based on the OpenCL C fmod
then they will need to be very accurate as well.
Another question is, what is the expectation when the integer part of the quotient cannot be represented by the floating point type. For example 2^100 and 2^-100 are both 32-bit floating point numbers, but the integer part of their quotient is not. Which of these SPIR-V instructions guarantees the correct result for these inputs?
FWIW, I think there are several SPIR-V instructions that are relevant to this issue:
- OpFRem. This is what gets tested by the spirv_new test
op_frem_float_regular
.- OpFMod. This is what gets tested by the spirv_new test
op_fmod_float_regular
.- The fmod instruction in the OpenCL extended instruction set.
- Maybe the remainder instruction in the OpenCL extended instruction set?
Of these instructions, only (3) and (4) have defined accuracy in the OpenCL SPIR-V Environment Spec. That seems like a problem, and we should define the expected accuracy for (1) and (2) as well. @karolherbst is this what you mean by "we can't use OpenCL C builtins at all, because SPIR-V instruction don't provide the necessary precision"?
Note that the required accuracy for the OpenCL C
fmod
andremainder
built-in functions is 0 ULP so if we do define the accuracy of the SPIR-V OpFMod and OpFRem based on the OpenCL Cfmod
then they will need to be very accurate as well.
Yes, and please don't. Project layering OpenCL on Vulkan will have a hard time dealing with that if the precision requirements diverge. One is free to use the OpenCL ExtInst inside SPIR-V if one need that precision. The normal SPIR-V instruction should probably be left alone. At least I don't see the point if using the OpenCL ExtInst is a way out.
Yes, and please don't. Project layering OpenCL on Vulkan will have a hard time dealing with that if the precision requirements diverge.
Does Vulkan define accuracy requirements for OpFMod and OpFRem? In other words, if we wanted to align the accuracy requirements, what would we require?
Yes, and please don't. Project layering OpenCL on Vulkan will have a hard time dealing with that if the precision requirements diverge.
Does Vulkan define accuracy requirements for OpFMod and OpFRem? In other words, if we wanted to align the accuracy requirements, what would we require?
from https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#spirvenv-op-prec
The OpFRem and OpFMod instructions use cheap approximations of remainder, and the error can be large due to the discontinuity in trunc() and floor(). This can produce mathematically unexpected results in some cases, such as FMod(x,x) computing x rather than 0, and can also cause the result to have a different sign than the infinitely precise result.
And the OpenCL SPIR-V env spec doesn't define anything here (only for the ExtInsts, and a couple of SPIR-V instructions ), so I think we are good as it is. Maybe some clarification might be helpful.
Thanks, for whatever reason my search for "vulkan spir-v environment" found the Vulkan 1.1 spec before the Vulkan 1.3 spec and it doesn't have accuracy requirements for these instructions.
It sounds like if we want to align with Vulkan the actions that are required are:
trunc
and floor
and hence may have a very large error.
Shall I open an OpenCL-Docs issue for (1) to discuss the specification updates?
Spec issue filed: https://github.com/KhronosGroup/OpenCL-Docs/issues/859
Reopening - these tests are disabled for now, but we still need to decide what should be tested for OpFMod and OpFRem.
I am not sure if this is an error from my understanding of the Env spec or if the assumption made in the test is actually wrong, but what the test is doing is to assume that
OpFRem
is equal to the OpenCL C builtinfmod
which I don't think is a valid assumption.It should compare the
fmod
OpenCL ExtInstr against a OpenCL C kernel usingfmod
instead.But maybe we have to assume that spir-v alu instructions have the same precision requirements as the OpenCL builtins? That would make layering OpenCL on top of Vulkan much harder though.
Thoughts on this?