Open RossBencina opened 3 years ago
Thank you for the insight @RossBencina ! OK I won't use PA_USE_C99_LRINTF
(long int)x; will truncate towards zero. So it is not symmetric about zero. What behavior do we really want? Do we want round to nearest? Or floor towards negative infinity?
It probably does not matter for PaInt32.
This feels like an old discussion.
@philburk the truncation operation is perfectly symmetric. The problem you mention is that the integer representation is not symmetric. In any case, none of that is relevant to this issue.
This ticket is about an "optimisation" that has crept into the code that is only enabled by the PA_USE_C99_LRINTF
#ifdef. The problem is that lrintf(scaled-0.5f);
(which is supposed to be an optimised truncation) DOES NOT truncate towards zero, as I showed above. It has issues due to IEEE 754 odd/even rounding.
I can only guess that the "optimisation" was intended to go faster than the non-optimised truncation.
PA_USE_C99_LRINTF
is defined nowhere. Not in any build file. So this buggy code is not usually called. I think we should just remove the lrint
code. If someone wants to submit a fixed version they can do it later. (It needs to truncate correctly, and it would need to be demonstrably faster than the usual truncate code).
In case there is any question that the current PA_USE_C99_LRINTF
is bad, I have written a program that tests all normalized floats between -1 and 1, and converted them using both the truncate and the buggy lrint
code. 830472190 values are incorrectly converted using the lrint
code.
#include <iostream>
#include <fenv.h> /* fesetround, FE_* */
#include <math.h> /* rint */
#include <cstdint>
//#define GOOD_APPROACH
using std::uint32_t;
using std::int32_t;
long int mytruncate(float x)
{
return lrintf(x-0.5f);
}
// IEEE 754
// sign: 1 bit
// exponent: 8 bits signed [-126, 127]
// mantissa: 23 bits (implicit leading 1)
float make_float(uint32_t sign, int32_t exponent, uint32_t mantissa)
{
uint32_t uexponent = static_cast<uint32_t>(exponent + 127) & 0xFF;
uint32_t iresult = (sign << 31) | (uexponent << 23) | mantissa;
char *src = (char*)&iresult;
float result;
char *dest = (char*)&result;
// assume that float and uint32_t have the same endianness. note this is not guaranteed by IEEE 754
dest[0] = src[0];
dest[1] = src[1];
dest[2] = src[2];
dest[3] = src[3];
return result;
}
int main()
{
#ifdef GOOD_APPROACH
fesetround(FE_TOWARDZERO);
#else
fesetround(FE_TONEAREST);
#endif
std::cout << make_float(0, 1, 0) << std::endl; // 2.0
std::cout << make_float(0, 0, 0) << std::endl; // 1.0
std::cout << make_float(0, -1, 0) << std::endl; // 0.5
std::cout << make_float(1, 1, 0) << std::endl; // -2.0
std::cout << make_float(1, 0, 0) << std::endl; // -1.0
std::cout << make_float(1, -1, 0) << std::endl; // -0.5
uint32_t badCount = 0;
// try all normalized numbers from 0 up to +/- 0.99999999 (i.e. just less than 1)
for (int32_t e = -126; e < 0; ++e) {
for (uint32_t m=0; m < 0x01000000; ++m) { // 24 bit mantissae
for (uint32_t sign=0; sign < 2; ++sign) {
float f = make_float(sign, e, m);
float scaled = f * 0x7FFFFFFF;
#ifdef GOOD_APPROACH
uint32_t munged = lrintf(scaled);
#else
uint32_t munged = lrintf(scaled-0.5f);
#endif
uint32_t truncated = (uint32_t) scaled;
if (munged != truncated) {
//std::cout << f << std::endl;
++badCount;
}
}
}
}
std::cout << "bad count: " << badCount << std::endl;
}
Output:
2
1
0.5
-2
-1
-0.5
bad count: 830472190
the truncation operation is perfectly symmetric.
Right. I meant that ((int)x) is not a straight line as it crosses zero.
I am not a fan of lrintf(). We can get rid of it.
There are a number of interesting conversion functions in the Android code here:
To be clear: I have no problem with lrintf
but the way it is being used here is to emulate truncation, and it is buggy. I would be happy to see a correct use of lrintf
if it proved faster than truncation (truncation can be slow). To reproduce the existing truncation behavior the code would need to
I will prepare a patch to remove the existing code.
Here is the original commit:
https://github.com/svn2github/PortAudio/commit/5134080bb625fa4cbcff9542dde206a6d6389d60
added code from David Viens to use C99's lrintf for float->int conversion when PA_USE_C99_LRINTF is defined git-svn-id: https://subversion.assembla.com/svn/portaudio@491 0f58301d-fd10-0410-b4af-bbb618454e57 master rossbencina committed on Mar 27, 2003
@RossBencina, I have commented in #403 and in favor of removing lrintf().
Related #100
Interesting and extensive analysis by Dmitry here: https://github.com/PortAudio/portaudio/pull/403#issuecomment-768449902
Hi Ross! I propose to remove lrint
(my comment #403 gives some grounding for this change) and rely on compiler optimization for the float to int C-style cast.
In this case PA conversion routines have to enforce rounding mode FE_TOWARDZERO
to avoid unexpected result if user code sets it to something else. Because rounding mode will be set outside the conversion loop it will be quite cheap on any platform.
Example:
static void Float32_To_Int16(
void *destinationBuffer, signed int destinationStride,
void *sourceBuffer, signed int sourceStride,
unsigned int count, struct PaUtilTriangularDitherGenerator *ditherGenerator )
{
float *src = (float*)sourceBuffer;
PaInt16 *dest = (PaInt16*)destinationBuffer;
int prevRound = fegetround();
(void)ditherGenerator; /* unused parameter */
if (fesetround(FE_TOWARDZERO) != 0)
prevRound = -1;
while( count-- )
{
short samp = (short) (*src * (32767.0f));
*dest = samp;
src += sourceStride;
dest += destinationStride;
}
if (prevRound != -1)
fesetround(prevRound);
}
It can be further improved by wrapping fegetround/fesetround
into own static inline
functions that simplifies implementation of each converter:
static inline int setTruncRoundingMode()
{
int prev = fegetround();
if (fesetround(FE_TOWARDZERO) != 0)
prev = -1;
return prev;
}
static inline void resetRoundingMode(int prev)
{
if (prev != -1)
fesetround(prev);
}
static void Float32_To_Int16(
void *destinationBuffer, signed int destinationStride,
void *sourceBuffer, signed int sourceStride,
unsigned int count, struct PaUtilTriangularDitherGenerator *ditherGenerator )
{
float *src = (float*)sourceBuffer;
PaInt16 *dest = (PaInt16*)destinationBuffer;
int prevRoundingMode = setTruncRoundingMode();
(void)ditherGenerator; /* unused parameter */
while( count-- )
{
short samp = (short) (*src * (32767.0f));
*dest = samp;
src += sourceStride;
dest += destinationStride;
}
resetRoundingMode(prevRoundingMode);
}
This implementation will work fine and consistently on any CPU/platform.
We can go even further then and optimize usage of fegetround/fesetround
depending on CPU, for example if CPU provides instruction to truncate float to int with rounding toward zero then calls to fegetround/fesetround
can be avoided:
#if !(defined(HAVE_SSE) || (__ARM_ARCH >= 8) || defined(__ARM_ARCH_8__) || defined(__ARM_ARCH_8A__))
#define PA_USE_FESETROUND
#endif
static inline int setTruncRoundingMode()
{
#ifdef PA_USE_FESETROUND
int prev = fegetround();
if (fesetround(FE_TOWARDZERO) != 0)
prev = -1;
return prev;
#else
return 0;
#endif
}
static inline void resetRoundingMode(int prev)
{
#ifdef PA_USE_FESETROUND
if (prev != -1)
fesetround(prev);
#endif
}
static inline int truncFloatToInt(float v)
{
#if defined(HAVE_SSE)
return _mm_cvttss_si32(_mm_load_ss(&v));
#elif (__ARM_ARCH >= 8) || defined(__ARM_ARCH_8__) || defined(__ARM_ARCH_8A__)
int ret;
__asm__ ("fcvtzs %x0, %s1" : "=r"(ret) : "w"(v));
return ret;
#else
return (int)v;
#endif
}
static void Float32_To_Int16(
void *destinationBuffer, signed int destinationStride,
void *sourceBuffer, signed int sourceStride,
unsigned int count, struct PaUtilTriangularDitherGenerator *ditherGenerator )
{
float *src = (float*)sourceBuffer;
PaInt16 *dest = (PaInt16*)destinationBuffer;
int prevRoundingMode = setTruncRoundingMode();
(void)ditherGenerator; /* unused parameter */
while( count-- )
{
short samp = (short)truncFloatToInt (*src * (32767.0f));
*dest = samp;
src += sourceStride;
dest += destinationStride;
}
resetRoundingMode(prevRoundingMode);
}
I could update #403 with this implementation (last 3-rd version), let me know about it.
We've removed the bad lrintf code from pa_converters.c in #403, but the conversation continues about how to fix the conversion algorithm. There is useful discussion in this ticket and in the #403 PR that we should keep active for now.
When
PA_USE_C99_LRINTF
is defined,pa_converters.c
useslrintf
for truncating floats to integers. For example here is one case:The code in context is here: https://github.com/PortAudio/portaudio/blob/1a5c410f7f09042d2d7ce1509ec3cf0b6091c7f8/src/common/pa_converters.c#L348
I am guessing that the code is intended to implement a fast truncation.
Problem 1: Presumably the code assumes that the rounding mode is set to round to nearest, however this file never checks or sets the rounding mode. Is it set elsewhere? that is not documented. I would expect there to be clear usage of
fesetround
and/orfegetround
.Problem 2: The code doesn't work the way the author expected. For example if
scaled == 3.0
thenlrintf(scaled-0.5f) == 2
. This is because of the way odd/even round to nearest works in IEEE floating point.The following test program:
outputs (non-matching entries marked with *):
Here is the above test code running in a sandbox: https://wandbox.org/permlink/ZIZOsQjfIgv3DKAb
If continued use of
lrintf
is desired, the fix that I would recommend is to set the rounding mode to truncate prior to the loop, and reset it to the old rounding mode afterwards. Then simply use*dest = lrintf(scaled)
in the loop.