dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.17k stars 4.72k forks source link

Add support for floating point control register manipulation - esp denormal/subnormal flush-to-zero #16018

Closed programmatom closed 2 years ago

programmatom commented 8 years ago

The general request I'm making is to expose the floating point control register to managed code. I realize that could have a lot of implications, so the minimal request is exposing the denormal/subnormal flush mode, so that "flush to zero" can be enabled on IEEE float values.

The scenario is that for certain real-time signal processing applications denormal numbers can pose performance problems. For a concrete scenario, consider the case of a real-time music synthesis program which provides various audio channels which have various audio processors operating on them. There is a class of audio processors, including delay/reverb type effects, which are implemented as IIR (infinite impulse response) filters. The character of these filters is that, when the input goes to silence, the internal state (and output) of the filter exponentially decays towards zero over time.

The trouble arises if non-silent input occurs sporadically, which significant periods of silence between. In that case, it is possible for the internal state of the filter to decay to denormal float range. This usually causes a 1-2 order of magnitude cost increase in computation time, which often causes the synthesizer to glitch because it starts missing it's deadlines to provide buffers to the audio output device.

The classic "C" solution is to enable "flush-to-zero" mode on the fpu, which solves the problem without code changes. The denormal threshhold is so far below audible that there is no loss of correctness for audio applications.

In managed code, one could write code to detect the denormal bit pattern and flush to zero explicitly, but this comes at a performance cost, which is a pity especially since the hardware can do it for free.

The reason it could be difficult is the question of how this would interoperate with other floating point consumers in the managed realm, such as WPF/winforms graphics libraries, as well as interop with unmanaged users of floating point. In general, modifying the fpu control register is very expensive (basically purges the entire processor pipeline), so it's to be done only infrequently. But correctness of other packages potentially could rely on IEEE denormal behavior. At minimum there could be a significant test impact.

One mitigation could be that this option would be set on a per-thread basis, once, at thread initialization, much like the COM apartment model specification is done. That would fit well with the music synthesis world of this specific scenario, where dedicated threads are used to do the real-time work.

joshfree commented 8 years ago

@mellinoe @CarolEidt

mellinoe commented 8 years ago

This certainly is an interesting problem area. I believe we've had various requests similar to this in the past, but I don't think we expose any of this functionality right now. Do you have any opinion on how the functionality could be exposed, i.e. the library functions a program would use to access the functionality?

programmatom commented 8 years ago

I thought I'd add another scenario, just for the record: the persistent NaN/Inf exception flag is also useful. Most audio processing algorithms I've encountered do not use NaN as an expected intermedate or output value. Therefore, NaN almost always indicates a malfunction (often because the parameters to an effect processor have been pushed to an unstable or invalid range). Exposure of the hardware flag would permit detection of this kind of breakdown without incurring the cost of checking every intermediate or output value.

In this scenario, it helps but isn't quite the slam-dunk that FTZ is because there are other ways algorithms can break down (e.g. unstable filters usually generate exponentially-increasing large values for a while before an actual NaN occurs) so explicit checking may still be necessary. The hardware flag could still be a helpful tool.

danmoseley commented 7 years ago

@CarolEidt any thoughts?

tannergooding commented 6 years ago

I'm going to open a more formal proposal on this sometime in the next week (just need to find some time to sit down and write it).

I believe that this is an important area, for the areas where it matters, especially since you can already manipulate these states via Native Code and they impact any future managed code upon doing so.

The hardware intrinsic APIs will also open up more instructions that can produce the various states this impacts, allowing more general consumption of them.

Finally, the IEEE spec for floating-point numbers recommends that this functionality is exposed.

juwens commented 2 years ago

maybe a keyword similar to checked/unchecked could be used. Which can be set at application, project and block level. Specifically DAZ (denormals-are-zero) and FTZ (flush-to-zero).

tannergooding commented 2 years ago

Going to close this. The overall interest since this was originally opened has been minimal. The actual work required in comparison is quite complex and the perf cost/implications would be non-trivial.

Subnormal values don't generally only have a 1-cycle latency on modern hardware where-as the cost to load, store, and restore the floating-point control word is approximately 24-cycles. Since the control word is per logical CPU, this load, store, and restore sequence would need to execute effectively as a "try-finally" like configuration per method, with it being "optimized" across inlining boundaries to not do it unnecessarily. It likewise would need to be saved/restored by the JIT if an interrupt happens, would negatively impact other optimizations the JIT can make, and would have implications with regards to async, Tasks, or other potentially multi-threaded scenarios.

On the other hand, checking if a value is subnormal, branching, and doing the relevant copysign for "correct" rounding to zero is cheap in comparison. Such functionality doesn't have any context saving or cross-threading considerations and can be isolated to only the end of an operation chain where its needed.


As such, I'd recommend devs looking to get this functionality to write their own extension method or wrapper type which does effectively the following:

public static float FlushToZero(this float value) => float.IsSubnormal(value) ? float.CopySign(0.0f, value) : value;

This can help keep the operation more pay for play while also keeping it localized to the scenarios most important to you.