KhronosGroup / OpenCL-CTS

The OpenCL Conformance Tests
Apache License 2.0
184 stars 196 forks source link

correctly rounded divide test for half is not using a correctly rounded reference #1996

Open b-sumner opened 3 months ago

b-sumner commented 3 months ago

The new correctly rounded divide test for half precision, located in binary_operator_half.cpp is using an fptr for its reference function and computing the reference like this:

    s[j] = HTF(p[j]);
    s2[j] = HTF(p2[j]);
    r[j] = HFF(func.f_ff(s[j], s2[j]));

Here func.f_ff works out to reference_divide(). So r[j] starts with the double precision rounded result of the divide, rounds it to single precision and then rounds that to half. That's 3 roundings instead of the required single rounding.

Shouldn't this test be disabled until a correct reference is used?

svenvh commented 3 months ago

The new correctly rounded divide test for half precision

Which reminds me, we should probably disable the correctly rounded tests for fp16 and fp64; see #1901 .

However, for fp16 I believe the divide_cr and divide tests are identical, since fp16 division must be correctly rounded (at least for the full profile). So if I'm not mistaken this issue also affects the regular fp16 divide test?

rjodinchr commented 3 months ago

When running the new fp16 divide test I get this kind of error:

ERROR: divide: 0.500000 ulp error at {0x1.428p-13 (0x090a), -0x1.ep+6 (0xd780)}
Expected: -0x1.6p-20  (half 0x8016)
Actual: -0x1.5p-20 (half 0x8015) at index: 61714

As the specification states that fp16 precision for divide is <= 1 ulp, I'm not sure if this is a real error, or a bug somewhere in the CTS.

rjodinchr commented 3 months ago

Alright, it feels like the the half_ulp is set to 0.0f here. I'll open another issue and will make a PR for it. My mistake, I was looking at the embedded profile. For the full profile, the division needs to be correctly rounded.

svenvh commented 3 months ago

~Alright, it feels like the the half_ulp is set to 0.0f here. I'll open another issue and will make a PR for it.~ My mistake, I was looking at the embedded profile. For the full profile, the division needs to be correctly rounded.

For the embedded profile we need to account for the <= 1 ulp, so we'll still need a separate issue I think (edit: we actually have #1685 for this already).

svenvh commented 2 months ago

@b-sumner are you seeing any failures related to the excess roundings? If so, could you share some of the problematic input values?

jlewis-austin commented 2 months ago

I haven't studied it closely, but this suggests some theorems for safe double rounding that could be helpful: https://dl.acm.org/doi/abs/10.1145/221332.221334

b-sumner commented 2 months ago

I thought I saw another patch about dropping the correctly rounded tests for non-fp32 floating point types, so perhaps this should be closed.

But while I'm here I want to ask a related question about the max ulp error for half division. What I see in function_list.cpp is:

412 { "divide", 413 "/", 414 { (void)reference_divide }, 415 { (void)reference_dividel }, 416 { (void*)reference_relaxed_divide }, 417 2.5f, 418 0.0f, 419 0.0f, 420 3.0f, 421 2.5f, 422 INFINITY, 423 FTZ_OFF, 424 RELAXED_ON, 425 binaryOperatorF },

Line 419 sets the half_ulps to 0.0. I would like to know where in the spec this requirement of correct rounding for half division appears or if the specified error is higher but simply not properly reflected in the test.

svenvh commented 2 months ago

I would like to know where in the spec this requirement of correct rounding for half division appears

Table 69 in https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#relative-error-as-ulps

b-sumner commented 2 months ago

Thank you.