KhronosGroup / OpenCL-CTS

The OpenCL Conformance Tests
Apache License 2.0
185 stars 197 forks source link

spirv_new frem/fmod tests may require excessive precision #1548

Open karolherbst opened 1 year ago

karolherbst commented 1 year ago

I am not sure if this is an error from my understanding of the Env spec or if the assumption made in the test is actually wrong, but what the test is doing is to assume that OpFRem is equal to the OpenCL C builtin fmod which I don't think is a valid assumption.

It should compare the fmod OpenCL ExtInstr against a OpenCL C kernel using fmod instead.

But maybe we have to assume that spir-v alu instructions have the same precision requirements as the OpenCL builtins? That would make layering OpenCL on top of Vulkan much harder though.

Thoughts on this?

bashbaug commented 1 year ago

Hi, it looks like something odd is going on here.

To make sure I'm looking in the right place, I believe this is the relevant code:

https://github.com/KhronosGroup/OpenCL-CTS/blob/6554c4901825381ee0d4d8ba199a66afff941a1a/test_conformance/spirv_new/test_op_fmath.cpp#L69

This is essentially create an OpenCL C kernel to compare against. The OpenCL C kernel will have the form:

#define spirv_fadd(a, b) (a) + (b)               
#define spirv_fsub(a, b) (a) - (b)               
#define spirv_fmul(a, b) (a) * (b)               
#define spirv_fdiv(a, b) (a) / (b)               
#define spirv_frem(a, b) fmod(a, b)              
#define spirv_fmod(a, b) copysign(fmod(a,b),b)   
#define T float
#define FUNC spirv_frem
__kernel void fmath_cl(__global T *out,          
const __global T *lhs, const __global T *rhs)    
{                                                
    int id = get_global_id(0);                   
    out[id] = FUNC(lhs[id], rhs[id]);            
}                                                

The define for spirv_frem does look a little strange, but because there is no frem OpenCL C built-in function to map to we need to do something special to handle this case.

Note that the SPIR-V description for OpFRem is:

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpFRem

The floating-point remainder whose sign matches the sign of Operand 1.

Does this mean that we have the spirv_frem and spirv_fmod cases backwards? In other words, should the kernel instead have the form:

...
#define spirv_fmod(a, b) fmod(a, b)              
#define spirv_frem(a, b) copysign(fmod(a,b),b)   

This seems to work fine for our implementation, although we didn't have an issue with the defines the other way around either so I'm not sure how much this tells us.

StuartDBrady commented 1 year ago

This rings a bell!

See KhronosGroup/SPIRV-LLVM-Translator#296.

It is correct to implement OpFRem as per the OpenCL C fmod builtin. The name of this builtin originally comes from SVr4 / 4.3BSD – I'm not sure of its original semantics, but in C90/C99, it is described as a remainder operation returning the sign of the first operand, i.e. the dividend – this means that fmod(x, y) matches trunc(x / y), i.e. abs(x - (fmod(x, y) + trunc(x / y))) < ε, and in the OpenCL C specification it is described, equivalently, as returning x - y * trunc(x / y).

OpFMod is quite different to the OpenCL C fmod builtin. I presume the name for this comes from OpenGL, but it is described as a remainder operation returning the sign of the second operand, i.e. the divisor. There is an implementation of this in terms of OpFRem in the SPIR-V/LLVM Translator, added in the above-mentioned PR.

The remainder builtin is yet another thing, with semantics such that essentially remainder(x, y) matches round(x / y), i.e. abs(x - (remainder(x, y) + round(x / y))) < ε. This is quite different to OpFMod and it is also quite different to OpFRem.

copysign(fmod(x, y), y) is quite simply wrong.

The testing passes because the input values it uses are generated with _genrandreal1, which produces values in the interval [0, 1], i.e. it does not produce negative test values for either the dividend or the divisor.

karolherbst commented 1 year ago

My point is we can't use OpenCL C builtins at all, because SPIR-V instruction don't provide the necessary precision. So even if the formula is correct, implementations are free to implement OpFRem and OpFMod with lower precision.

karolherbst commented 1 year ago

And yeah.. it passes with some random seeds and does not with some special ones. I got it to pass just running the subtests, but running all tests makes them fail.

bashbaug commented 1 year ago

FWIW, I think there are several SPIR-V instructions that are relevant to this issue:

  1. OpFRem. This is what gets tested by the spirv_new test op_frem_float_regular.
  2. OpFMod. This is what gets tested by the spirv_new test op_fmod_float_regular.
  3. The fmod instruction in the OpenCL extended instruction set.
  4. Maybe the remainder instruction in the OpenCL extended instruction set?

Of these instructions, only (3) and (4) have defined accuracy in the OpenCL SPIR-V Environment Spec. That seems like a problem, and we should define the expected accuracy for (1) and (2) as well. @karolherbst is this what you mean by "we can't use OpenCL C builtins at all, because SPIR-V instruction don't provide the necessary precision"?

Note that the required accuracy for the OpenCL C fmod and remainder built-in functions is 0 ULP so if we do define the accuracy of the SPIR-V OpFMod and OpFRem based on the OpenCL C fmod then they will need to be very accurate as well.

b-sumner commented 1 year ago

Another question is, what is the expectation when the integer part of the quotient cannot be represented by the floating point type. For example 2^100 and 2^-100 are both 32-bit floating point numbers, but the integer part of their quotient is not. Which of these SPIR-V instructions guarantees the correct result for these inputs?

karolherbst commented 1 year ago

FWIW, I think there are several SPIR-V instructions that are relevant to this issue:

  1. OpFRem. This is what gets tested by the spirv_new test op_frem_float_regular.
  2. OpFMod. This is what gets tested by the spirv_new test op_fmod_float_regular.
  3. The fmod instruction in the OpenCL extended instruction set.
  4. Maybe the remainder instruction in the OpenCL extended instruction set?

Of these instructions, only (3) and (4) have defined accuracy in the OpenCL SPIR-V Environment Spec. That seems like a problem, and we should define the expected accuracy for (1) and (2) as well. @karolherbst is this what you mean by "we can't use OpenCL C builtins at all, because SPIR-V instruction don't provide the necessary precision"?

Note that the required accuracy for the OpenCL C fmod and remainder built-in functions is 0 ULP so if we do define the accuracy of the SPIR-V OpFMod and OpFRem based on the OpenCL C fmod then they will need to be very accurate as well.

Yes, and please don't. Project layering OpenCL on Vulkan will have a hard time dealing with that if the precision requirements diverge. One is free to use the OpenCL ExtInst inside SPIR-V if one need that precision. The normal SPIR-V instruction should probably be left alone. At least I don't see the point if using the OpenCL ExtInst is a way out.

bashbaug commented 1 year ago

Yes, and please don't. Project layering OpenCL on Vulkan will have a hard time dealing with that if the precision requirements diverge.

Does Vulkan define accuracy requirements for OpFMod and OpFRem? In other words, if we wanted to align the accuracy requirements, what would we require?

karolherbst commented 1 year ago

Yes, and please don't. Project layering OpenCL on Vulkan will have a hard time dealing with that if the precision requirements diverge.

Does Vulkan define accuracy requirements for OpFMod and OpFRem? In other words, if we wanted to align the accuracy requirements, what would we require?

from https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#spirvenv-op-prec

The OpFRem and OpFMod instructions use cheap approximations of remainder, and the error can be large due to the discontinuity in trunc() and floor(). This can produce mathematically unexpected results in some cases, such as FMod(x,x) computing x rather than 0, and can also cause the result to have a different sign than the infinitely precise result.

And the OpenCL SPIR-V env spec doesn't define anything here (only for the ExtInsts, and a couple of SPIR-V instructions ), so I think we are good as it is. Maybe some clarification might be helpful.

https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_ulp_values_for_math_instructions_full_profile

bashbaug commented 1 year ago

Thanks, for whatever reason my search for "vulkan spir-v environment" found the Vulkan 1.1 spec before the Vulkan 1.3 spec and it doesn't have accuracy requirements for these instructions.

It sounds like if we want to align with Vulkan the actions that are required are:

  1. We need to update the OpenCL SPIR-V environment spec to describe that the accuracy of OpFMod and OpFRem are derived from trunc and floor and hence may have a very large error.
    • We should check if any other instructions are missing, similarly.
  2. We should update the spirv_new tests so they do not test for equivalence with the OpenCL C built-ins. I'm not sure exactly what values we'd test against, but at the very least we should ensure kernels with these instructions continue to compile.

Shall I open an OpenCL-Docs issue for (1) to discuss the specification updates?

bashbaug commented 1 year ago

Spec issue filed: https://github.com/KhronosGroup/OpenCL-Docs/issues/859

bashbaug commented 1 year ago

Reopening - these tests are disabled for now, but we still need to decide what should be tested for OpFMod and OpFRem.