pa_converters.c lrintf usage is [was] suspect: does not reproduce the same values as non-lrintf code

RossBencina commented 3 years ago

When PA_USE_C99_LRINTF is defined, pa_converters.c uses lrintf for truncating floats to integers. For example here is one case:

#ifdef PA_USE_C99_LRINTF
        float scaled = *src * 0x7FFFFFFF;
        *dest = lrintf(scaled-0.5f);
#else
        double scaled = *src * 0x7FFFFFFF;
        *dest = (PaInt32) scaled;        
#endif

The code in context is here: https://github.com/PortAudio/portaudio/blob/1a5c410f7f09042d2d7ce1509ec3cf0b6091c7f8/src/common/pa_converters.c#L348

I am guessing that the code is intended to implement a fast truncation.

Problem 1: Presumably the code assumes that the rounding mode is set to round to nearest, however this file never checks or sets the rounding mode. Is it set elsewhere? that is not documented. I would expect there to be clear usage of fesetround and/or fegetround.

Problem 2: The code doesn't work the way the author expected. For example if scaled == 3.0 then lrintf(scaled-0.5f) == 2. This is because of the way odd/even round to nearest works in IEEE floating point.

The following test program:

#include <iostream>
#include <fenv.h>       /* fesetround, FE_* */
#include <math.h>       /* rint */

long int myc99truncate(float x)
{
    return lrintf(x-0.5f);
}

long int mytruncate(float x)
{
    return (long int)x;
}

int main()
{
    fesetround(FE_TONEAREST);

    std::cout << 0.0f << "   truncates to " << mytruncate(0.0f) << " or " << myc99truncate(0.0f) << std::endl;
    std::cout << 0.5f << " truncates to " << mytruncate(0.5f) << " or " << myc99truncate(0.5f) << std::endl;
    std::cout << 1.0f << "   truncates to " << mytruncate(1.0f) << " or " << myc99truncate(1.0f) << " *" << std::endl;
    std::cout << 1.5f << " truncates to " << mytruncate(1.5f) << " or " << myc99truncate(1.5f) << std::endl;
    std::cout << 2.0f << "   truncates to " << mytruncate(2.0f) << " or " << myc99truncate(2.0f) << std::endl;
    std::cout << 2.5f << " truncates to " << mytruncate(2.5f) << " or " << myc99truncate(2.5f) << std::endl;
    std::cout << 3.0f << "   truncates to " << mytruncate(3.0f) << " or " << myc99truncate(3.0f) << " *" << std::endl;
    std::cout << 3.5f << " truncates to " << mytruncate(3.5f) << " or " << myc99truncate(3.5f) << std::endl;
    std::cout << 4.0f << "   truncates to " << mytruncate(4.0f) << " or " << myc99truncate(4.0f) << std::endl;
    std::cout << 4.5f << " truncates to " << mytruncate(4.5f) << " or " << myc99truncate(4.5f) << std::endl;
}

outputs (non-matching entries marked with *):

0   truncates to 0 or 0
0.5 truncates to 0 or 0
1   truncates to 1 or 0 *
1.5 truncates to 1 or 1
2   truncates to 2 or 2
2.5 truncates to 2 or 2
3   truncates to 3 or 2 *
3.5 truncates to 3 or 3
4   truncates to 4 or 4
4.5 truncates to 4 or 4

Here is the above test code running in a sandbox: https://wandbox.org/permlink/ZIZOsQjfIgv3DKAb

If continued use of lrintf is desired, the fix that I would recommend is to set the rounding mode to truncate prior to the loop, and reset it to the old rounding mode afterwards. Then simply use *dest = lrintf(scaled) in the loop.

jmelas commented 3 years ago

Thank you for the insight @RossBencina ! OK I won't use PA_USE_C99_LRINTF

philburk commented 3 years ago

(long int)x; will truncate towards zero. So it is not symmetric about zero. What behavior do we really want? Do we want round to nearest? Or floor towards negative infinity?

It probably does not matter for PaInt32.

This feels like an old discussion.

RossBencina commented 3 years ago

@philburk the truncation operation is perfectly symmetric. The problem you mention is that the integer representation is not symmetric. In any case, none of that is relevant to this issue.

This ticket is about an "optimisation" that has crept into the code that is only enabled by the PA_USE_C99_LRINTF #ifdef. The problem is that lrintf(scaled-0.5f); (which is supposed to be an optimised truncation) DOES NOT truncate towards zero, as I showed above. It has issues due to IEEE 754 odd/even rounding.

I can only guess that the "optimisation" was intended to go faster than the non-optimised truncation.

PA_USE_C99_LRINTF is defined nowhere. Not in any build file. So this buggy code is not usually called. I think we should just remove the lrint code. If someone wants to submit a fixed version they can do it later. (It needs to truncate correctly, and it would need to be demonstrably faster than the usual truncate code).

RossBencina commented 3 years ago

In case there is any question that the current PA_USE_C99_LRINTF is bad, I have written a program that tests all normalized floats between -1 and 1, and converted them using both the truncate and the buggy lrint code. 830472190 values are incorrectly converted using the lrint code.

#include <iostream>
#include <fenv.h>       /* fesetround, FE_* */
#include <math.h>       /* rint */
#include <cstdint>

//#define GOOD_APPROACH

using std::uint32_t;
using std::int32_t;

long int mytruncate(float x)
{
    return lrintf(x-0.5f);
}

// IEEE 754
// sign: 1 bit
// exponent: 8 bits signed [-126, 127]
// mantissa: 23 bits (implicit leading 1)
float make_float(uint32_t sign, int32_t exponent, uint32_t mantissa)
{
    uint32_t uexponent = static_cast<uint32_t>(exponent + 127) & 0xFF;
    uint32_t iresult = (sign << 31) | (uexponent << 23) | mantissa;
    char *src = (char*)&iresult;

    float result;
    char *dest = (char*)&result;

    // assume that float and uint32_t have the same endianness. note this is not guaranteed by IEEE 754
    dest[0] = src[0];
    dest[1] = src[1];
    dest[2] = src[2];
    dest[3] = src[3];

    return result;
}

int main()
{
#ifdef GOOD_APPROACH
    fesetround(FE_TOWARDZERO);
#else
    fesetround(FE_TONEAREST);   
#endif

    std::cout << make_float(0, 1, 0) << std::endl; // 2.0
    std::cout << make_float(0, 0, 0) << std::endl; // 1.0
    std::cout << make_float(0, -1, 0) << std::endl; // 0.5

    std::cout << make_float(1, 1, 0) << std::endl; // -2.0
    std::cout << make_float(1, 0, 0) << std::endl; // -1.0
    std::cout << make_float(1, -1, 0) << std::endl; // -0.5

    uint32_t badCount = 0;

    // try all normalized numbers from 0 up to +/- 0.99999999 (i.e. just less than 1)
    for (int32_t e = -126; e < 0; ++e) { 
        for (uint32_t m=0; m < 0x01000000; ++m) { // 24 bit mantissae 
            for (uint32_t sign=0; sign < 2; ++sign) {
                float f = make_float(sign, e, m);
                float scaled = f * 0x7FFFFFFF;

#ifdef GOOD_APPROACH
    uint32_t munged = lrintf(scaled);
#else
    uint32_t munged = lrintf(scaled-0.5f); 
#endif

                uint32_t truncated = (uint32_t) scaled;

                if (munged != truncated) {
                    //std::cout << f << std::endl;
                    ++badCount;
                }
            }            
        }
    }

    std::cout << "bad count: " << badCount << std::endl;
}

Output:

2
1
0.5
-2
-1
-0.5
bad count: 830472190

Sandbox: https://wandbox.org/permlink/oUHF6t0jCzaO6ebO

philburk commented 3 years ago

the truncation operation is perfectly symmetric.

Right. I meant that ((int)x) is not a straight line as it crosses zero.

I am not a fan of lrintf(). We can get rid of it.

There are a number of interesting conversion functions in the Android code here:

https://cs.android.com/android/platform/superproject/+/master:system/media/audio_utils/include/audio_utils/primitives.h;l=961;drc=master

RossBencina commented 3 years ago

To be clear: I have no problem with lrintf but the way it is being used here is to emulate truncation, and it is buggy. I would be happy to see a correct use of lrintf if it proved faster than truncation (truncation can be slow). To reproduce the existing truncation behavior the code would need to

save and restore the rounding mode outside the loop
set the rounding mode correctly prior to the loop (e.g. to round to zero, although we could consider round to nearest)
use lrintf without out any +/-0.5 weirdness.

I will prepare a patch to remove the existing code.

RossBencina commented 3 years ago

Here is the original commit:

https://github.com/svn2github/PortAudio/commit/5134080bb625fa4cbcff9542dde206a6d6389d60

added code from David Viens to use C99's lrintf for float->int conversion when PA_USE_C99_LRINTF is defined git-svn-id: https://subversion.assembla.com/svn/portaudio@491 0f58301d-fd10-0410-b4af-bbb618454e57 master rossbencina committed on Mar 27, 2003

dmitrykos commented 3 years ago

@RossBencina, I have commented in #403 and in favor of removing lrintf().

RossBencina commented 2 years ago

Related #100

RossBencina commented 1 year ago

Interesting and extensive analysis by Dmitry here: https://github.com/PortAudio/portaudio/pull/403#issuecomment-768449902

dmitrykos commented 1 year ago

Hi Ross! I propose to remove lrint (my comment #403 gives some grounding for this change) and rely on compiler optimization for the float to int C-style cast.

In this case PA conversion routines have to enforce rounding mode FE_TOWARDZERO to avoid unexpected result if user code sets it to something else. Because rounding mode will be set outside the conversion loop it will be quite cheap on any platform.

Example:

static void Float32_To_Int16(
    void *destinationBuffer, signed int destinationStride,
    void *sourceBuffer, signed int sourceStride,
    unsigned int count, struct PaUtilTriangularDitherGenerator *ditherGenerator )
{
    float *src = (float*)sourceBuffer;
    PaInt16 *dest =  (PaInt16*)destinationBuffer;
    int prevRound = fegetround();
    (void)ditherGenerator; /* unused parameter */

    if (fesetround(FE_TOWARDZERO) != 0)
        prevRound = -1;

    while( count-- )
    {
        short samp = (short) (*src * (32767.0f));
        *dest = samp;

        src += sourceStride;
        dest += destinationStride;
    }

    if (prevRound != -1)
        fesetround(prevRound);
}

It can be further improved by wrapping fegetround/fesetround into own static inline functions that simplifies implementation of each converter:

static inline int setTruncRoundingMode()
{
    int prev = fegetround();    
    if (fesetround(FE_TOWARDZERO) != 0)
        prev = -1;

    return prev;
}

static inline void resetRoundingMode(int prev)
{
    if (prev != -1)
        fesetround(prev);
}

static void Float32_To_Int16(
    void *destinationBuffer, signed int destinationStride,
    void *sourceBuffer, signed int sourceStride,
    unsigned int count, struct PaUtilTriangularDitherGenerator *ditherGenerator )
{
    float *src = (float*)sourceBuffer;
    PaInt16 *dest =  (PaInt16*)destinationBuffer;
    int prevRoundingMode = setTruncRoundingMode();
    (void)ditherGenerator; /* unused parameter */

    while( count-- )
    {
        short samp = (short) (*src * (32767.0f));
        *dest = samp;

        src += sourceStride;
        dest += destinationStride;
    }

    resetRoundingMode(prevRoundingMode);
}

This implementation will work fine and consistently on any CPU/platform.

We can go even further then and optimize usage of fegetround/fesetround depending on CPU, for example if CPU provides instruction to truncate float to int with rounding toward zero then calls to fegetround/fesetround can be avoided:

#if !(defined(HAVE_SSE) || (__ARM_ARCH >= 8) || defined(__ARM_ARCH_8__) || defined(__ARM_ARCH_8A__))
    #define PA_USE_FESETROUND
#endif

static inline int setTruncRoundingMode()
{
#ifdef PA_USE_FESETROUND
    int prev = fegetround();

    if (fesetround(FE_TOWARDZERO) != 0)
        prev = -1;

    return prev;
#else
    return 0;
#endif
}

static inline void resetRoundingMode(int prev)
{
#ifdef PA_USE_FESETROUND
    if (prev != -1)
        fesetround(prev);
#endif
}

static inline int truncFloatToInt(float v)
{
#if defined(HAVE_SSE)
    return _mm_cvttss_si32(_mm_load_ss(&v));
#elif (__ARM_ARCH >= 8) || defined(__ARM_ARCH_8__) || defined(__ARM_ARCH_8A__)
    int ret;
    __asm__ ("fcvtzs %x0, %s1" : "=r"(ret) : "w"(v));
    return ret;
#else
    return (int)v;
#endif
}

static void Float32_To_Int16(
    void *destinationBuffer, signed int destinationStride,
    void *sourceBuffer, signed int sourceStride,
    unsigned int count, struct PaUtilTriangularDitherGenerator *ditherGenerator )
{
    float *src = (float*)sourceBuffer;
    PaInt16 *dest =  (PaInt16*)destinationBuffer;
    int prevRoundingMode = setTruncRoundingMode();
    (void)ditherGenerator; /* unused parameter */

    while( count-- )
    {
        short samp = (short)truncFloatToInt (*src * (32767.0f));
        *dest = samp;

        src += sourceStride;
        dest += destinationStride;
    }

    resetRoundingMode(prevRoundingMode);
}

I could update #403 with this implementation (last 3-rd version), let me know about it.

RossBencina commented 1 year ago

We've removed the bad lrintf code from pa_converters.c in #403, but the conversation continues about how to fix the conversion algorithm. There is useful discussion in this ticket and in the #403 PR that we should keep active for now.

PortAudio / portaudio

pa_converters.c lrintf usage is [was] suspect: does not reproduce the same values as non-lrintf code #390