[AMDGPU] fneg optimized to xor with sign bit removed

Quuxplusone commented 4 years ago


Bugzilla Link	PR43711
Status	REOPENED
Importance	P enhancement
Reported by	Samuel Pitoiset (samuel.pitoiset@gmail.com)
Reported on	2019-10-18 08:12:10 -0700
Last modified on	2020-01-21 07:26:52 -0800
Version	trunk
Hardware	PC Linux
CC	andrew.kaylor@intel.com, cameron.mcinally@nyu.edu, cwabbott0@gmail.com, htmldeveloper@gmail.com, lebedev.ri@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, Matthew.Arsenault@amd.com, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

We implemented a workaround in Mesa [1]. It shouldn't be needed and fneg should always return a canonicalized result.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2322/diffs?commit_id=2c2aaf275c1edba38c552ac74de4d46bb2ebfbe8

Quuxplusone commented 4 years ago

I don't follow? Fneg is defined as not canonicalizing

Quuxplusone commented 4 years ago

IEEE-754 2008 5.5.1 "Sign bit operations" states "they only affect the sign
bit" and
 "These operations may propagate non-canonical encodings."

I would not expect nir_op_fneg to behave any differently, but if you for some
reason defined it that way, emitting the separate canonicalize would be the
correct way to map it to llvm semantics. I suspect you're really missing a
canonicalize somewhere else though.

Quuxplusone commented 4 years ago

nir_op_fneg in Mesa canonicalizes the result because that's what Vulkan requires. Specifically VK_KHR_floating_point_controls requires that the SPIR-V OpFNeg flushes denorms to 0 when flushing denorms is enabled for the shader.

We currently map nir_op_fneg to "fsub -0, x" using LLVMBuildFNeg(). This should produce a canonical result, the same as if I say "fsub y, x" and y happens to be -0, and be identical to a normal fneg otherwise except maybe with funky rounding modes. In the past this was what Clang also did, even though C/C++'s negate operator maps to the standard IEEE sign bit operation that you quoted, so it was wrong, and only recently has there been a native FNeg instruction added which does just sign bit fiddling (and I don't even know if Clang uses it yet). Backends then detected "fsub -0, x" and turned it back into an xor instruction, which fixes the original clang bug but introduces a new one for things like "float x = -0, y = ...; return x - y;" This is still happening with AMDGPU, and now that we're trying to support DX games where DX expects that you consistently flush denorms when adding/subtracting, that's not great.

I'm not sure moving forward what the canonical way is for an LLVM frontend to emit a negate that also canonicalizes, "fsub -0, x" or "canonicalize(fneg(x))". But the current LLVM behavior where "fsub -0, x" gets remapped to "fneg(x)" is definitely wrong.

Quuxplusone commented 4 years ago

But the current LLVM behavior where "fsub -0, x" gets remapped to "fneg(x)" is definitely wrong.

Agreed.

We’re currently working on this, but it still needs some more time. Clang, as of last week, does issue a unary FNeg IR instruction. However, SelectionDAGBuilder always canonicalizes a binary FNeg (-0.0-X) to an unary FNeg (-X).

Andy Kaylor suggested adding a NoDAZ/NoFTZ flag to the FastMathFlags set. This will allow us to selectively canonicalize binary FNegs as desired, as well as toggle some other unsafe transforms. I agree with this plan and would like to see it happen.

I intend to bring up this topic on llvm-dev after the conference. I’d appreciate your input/support when the time comes.

Quuxplusone commented 4 years ago

(In reply to Cameron McInally from comment #4)
> SelectionDAGBuilder always canonicalizes a binary FNeg (-0.0-X) to an unary
> FNeg (-X).

This is the code in question?

  // (fsub -0.0, N1) -> -N1
  // NOTE: It is safe to transform an FSUB(-0.0,X) into an FNEG(X), since the
  //       FSUB does not specify the sign bit of a NaN. Also note that for
  //       the same reason, the inverse transform is not safe, unless fast math
  //       flags are in play.
  if (N0CFP && N0CFP->isZero()) {
    if (N0CFP->isNegative() ||
        (Options.NoSignedZerosFPMath || Flags.hasNoSignedZeros())) {
      if (TLI.isNegatibleForFree(N1, DAG, LegalOperations, ForCodeSize))
        return TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize);
      if (!LegalOperations || TLI.isOperationLegal(ISD::FNEG, VT))
        return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);
    }
  }

-----------------------------------------------------------------------------

Does the target set the denorm function attribute? Ie, can we add a clause to
guard against non-IEEE-compliant target behavior?

      const Function &F = DAG.getMachineFunction().getFunction();
      if (F.hasFnAttribute("denormal-fp-math")) {
        Attribute DenormStyle = F.getFnAttribute("denormal-fp-math");
        if (!DenormStyle.getValueAsString().equals("ieee"))
          return SDValue();
      }
      if (!LegalOperations || TLI.isOperationLegal(ISD::FNEG, VT))
        return DAG.getNode(ISD::FNEG, DL, VT, N1, Flags);

Quuxplusone commented 4 years ago

That is the code.

The solution isn’t so clear. The stakeholders discussed this at the dev meeting, but I don’t think we came to an acceptable solution. Each target handles denormals differently, so there’s a lot to digest. The action-item was to bring this up on llvm-dev, so that we will get hard requirements for each target.

I’m not sure about the function attribute solution — I’ve never looked at it before — but it sounds reasonable.

Quuxplusone commented 4 years ago

SPIR-V defining fneg differently than IEEE behavior is tragically broken.

As far as the attribute is concerned, that isn't used followed for AMDGPU use. We currently have a subtarget based solution for this setting in AMDGPU which I would like to replace with the function attribute (this comes with the complication that this should be broken down into per-FP type variants)

To me the path forward seem reasonably clear for this specific problem. The fsub->fneg transform should not be performed, and the SPIR-V fneg should probably be implemented by directly emitting the fsub -0, x form. Whether the C API for BuildFNeg should be switched to emitting the new fneg instruction is another potentially breaking change to think about. There are additional problems to consider with respect to inserting more canonicalizes on various folds that happen earlier in the IR as well

Quuxplusone commented 4 years ago

> To me the path forward seem reasonably clear for this specific problem. The
fsub->fneg transform should not be performed

Oh, I agree that this is the “correct” thing to do. But it can, and most likely
will, affect performance for targets/users that don’t care about this subtle
distinction. We’re changing a logical op into a FP arithmetic op for a heavily
used operation. I’m suggesting a switch to mitigate that performance penalty
for targets/users that don’t care. I don’t know what that switch should look
like yet though...

Quuxplusone commented 4 years ago

(In reply to Matt Arsenault from comment #7)
> As far as the attribute is concerned, that isn't used followed for AMDGPU
> use. We currently have a subtarget based solution for this setting in AMDGPU
> which I would like to replace with the function attribute (this comes with
> the complication that this should be broken down into per-FP type variants)

Another possibility that would allow per-type variation - expand/add something
like this existing TLI hook and use that to guard the DAG combine:

  /// Return true if it is safe to transform an integer-domain bitwise operation
  /// into the equivalent floating-point operation. This should be set to true
  /// if the target has IEEE-754-compliant fabs/fneg operations for the input
  /// type.
  virtual bool hasBitPreservingFPLogic(EVT VT) const {
    return false;
  }

Quuxplusone commented 4 years ago

Some action brought this to my attention again. We do have plans to straighten out the improper FNeg canonicalization once the subnormal controls are fleshed out.

Matt, any updates on the subnormal switches? I’ve not been paying as close attention as I should. Are we ready to hook them up to the FNeg code?

Quuxplusone commented 4 years ago

(In reply to Cameron McInally from comment #10)
> Some action brought this to my attention again. We do have plans to
> straighten out the improper FNeg canonicalization once the subnormal
> controls are fleshed out.
>
> Matt, any updates on the subnormal switches? I’ve not been paying as close
> attention as I should. Are we ready to hook them up to the FNeg code?

I don't follow the connection between the subnormal controls and switching
fneg. These are orthogonal issues?

Quuxplusone commented 4 years ago

I don't follow the connection between the subnormal controls and switching fneg. These are orthogonal issues?

Just didn't want to move the target while the subnormal controls are in flux.

I'm looking for a way to toggle the bad FNeg canonicalization whether we are in FTZ/DAZ mode or not. IINM, the new subnormal controls default to no FTZ/DAZ (on a per-target basis), which would avoid the potential performance penalty on targets that don't care about the FNeg distinction.

Quuxplusone / LLVMBugzillaTest

[AMDGPU] fneg optimized to xor with sign bit removed #42681