Faster and simpler floating-point truncation

ajdawson commented 8 years ago

This PR adds a new method for applying significand truncation to floating-point numbers. It applies the following to compute a truncated value:

x_t = 2^m x - (2^m - 1) x

where x is the value to truncate, m is the number of bits to truncate (m = 52 - n where n is the number of bits remaining in the significand) and x_t is the truncated value.

This scheme operates somewhat faster than the bitwise scheme (about 2/3 of the runtime is a simple benchmark), but is susceptible to overflow errors if working with extremely large values truncated to a small number of bits in the significand.

The scheme is currently opt-in due to both the (small) risk of overflow, and due to a slightly different rounding scheme (round to nearest, tie to even).

ajdawson commented 8 years ago

Need to check if this works properly with the make source build command.

Edit: result was adding a new commit to disable cpp warnings.

ajdawson commented 8 years ago

This is not functioning correctly. On hold for now.

aopp-pred / rpe

Faster and simpler floating-point truncation #3