gcd algorithm - Githubissues

mvduin commented 9 months ago

I saw findGCD mentioned in the changelog so I looked at it out of curiosity. Is there a reason why it's tediously using repeated subtraction instead of using Euclid's algorithm?

e.g. something like:

#include <cmath>

// Computes greatest common divisor of two floating-point numbers.
// Returns zero if both arguments are zero.
template< typename Float >
Float fgcd( Float a, Float b )
{
    // bail out if one of the arguments is a NaN or ±infinity
    if( ! std::isfinite( a ) || ! std::isfinite( b ) )
        return (Float)NAN;

    a = std::fabs( a );
    b = std::fabs( b );

    if( a < b )
        std::swap( a, b );

    while( b ) {
        a = std::fabs( std::remainder( a, b ) );
        std::swap( a, b );
    }

    return a;
}

The number of loop iterations cannot exceed the number of mantissa bits, i.e. 53 for double (although the worst example I've found used 42 iterations). Note that any two finite floating-point numbers have a greatest common divisor unless they're both zero (in which case this returns zero by standard mathematical convention).

avaneev commented 9 months ago

Well, your variant uses division which is very slow to compute. Beside that, an algorithm that utilizes division is likely to be much more erratic than algorithm with subtractions.

avaneev commented 9 months ago

I've just did a test - Euclid's variant is 2 times slower to compute mutual GCDs out of a set of standard sample rates: { 8000, 11025, 14000, 16000, 18000, 22050, 24000, 32000, 44100, 48000, 60000, 88200, 96000, 176400, 192000, 352800, 384000, 705600, 768000 }

avaneev commented 9 months ago

I think in a general case (any a and b) Euclid's variant is faster, but your code does not handle arithmetic errors (if non-integer sample rates are used), for which my variant uses iteration count. There is no need to compute a general case in r8brain-free-src, because DstSampleRate / GCD is limited to 1500.

mvduin commented 9 months ago

There are no "arithmetic errors", my code produces correct results for any finite arguments, integer or not, e.g. fgcd(4.5, 6.0) == 1.5. So does your code btw, in fact if you were to remove the iteration limit I'm pretty sure it should produce the same result as my code does for any finite arguments except its performance would be unacceptable for non-nice inputs.

Note that my version could probably be optimized, I didn't really pay attention to optimization since it seemed to only be used during intialization hence I assumed it wasn't really performance critical and supporting general values seemed more important. It sounds like my assumption on that was incorrect.

Hmm, I'm somewhat surprised it's that much slower... even for the standard rates you listed your code needs up to 144 iterations for some combinations, while mine only up to 6. I didn't think fp division would be so slow on modern cores that it makes up for such a big ratio in iteration count. Maybe std::remainder just isn't very efficient. The algorithm would still work fine if it used fabs( a - n * b ) instead where n is an integer approximation of a / b that meets some mild conditions, so the question would be what's the fastest way to obtain a sufficiently good value of n. (Your code effectively just uses n = 1.)

I noticed that 1500 limit yeah, so that means my code could give up after 12 iterations or so (or 11? I'd need to analyze).

avaneev commented 9 months ago

Euclid's algorithm can't be optimized. It's optimal already except division is expensive. Division operation was always expensive on all past and modern CPUs, and probably will be expensive in the future.

avaneev commented 9 months ago

I was worrying your code will deadlock on arguments like a=exp(1.0) b=asin(1.0), but it does not. However, it returns 0 which is an error, and should be expected and handled. I'm using double type arguments, and not the float, however.

mvduin commented 9 months ago

I was worrying your code will deadlock on arguments like a=exp(1.0) b=asin(1.0), but it does not. However, it returns 0 which is an error,

Uhh, no? It will only return 0 if both arguments are 0. For your particular values of a and b it returns 0x1p-51 (i.e. 2^-51 (where ^ denotes exponentiation, not xor)).

If it's unclear what's going on, the key thing to remember is that while (finite) floating-point numbers are often thought of as "approximate real numbers", their exact values are simply rational numbers whose denominators are powers of two, and these actually always have a greatest common divisor (unless al arguments are zero).

For example, your particular values of a and b, after rounding the exact mathematical values to the nearest double precision floats, have the following exact representations using hex float syntax:

printf( "%a\n", exp(1.0) );  // 0x1.5bf0a8b145769p+1   ... or equivalently 0x15bf0a8b145769p-51
printf( "%a\n", asin(1.0) ); // 0x1.921fb54442d18p+0   ... or equivalently 0x1921fb54442d18p-52

It's now straightforward to see that 0x1p-51 is a common divider, and indeed one can verify it is in fact their GCD, e.g. by asking WolframAlpha.

avaneev commented 9 months ago

As I've noted before, division is arithmetically unstable, and depends on compiler optimizations, and on CPU implementation. My compiler on Ryzen 3700X returns 0 for these arguments, after 31 iterations. Two irrational numbers do not have a common divider other than 1.

avaneev commented 9 months ago

I was using fmod(a, b), however. Which, again, means fmod() implementation is different to std::remainder() which seemingly just replaces out-of-bounds result with double type's Epsilon.

mvduin commented 9 months ago

I'm very puzzled how you're getting 0 as result, can you tell me how to reproduce this? I can't reproduce your result no matter what I try, I'm getting correct results from both gcc and clang at any optimization level, and with every variation of the algorithm I've tried: https://godbolt.org/z/T9zG5edbr (note however that fmod (variant 1) is slower than the other variants, and I'm not sure variant 2 will give correct results if the second multiply is inexact)

One final try to clear up misconceptions about floating-point gcd here:

I've already explained floating point numbers are never irrational. While the mathematical exact value of exp(1) is irrational, its nearest double-precision floating point approximation is a rational number, 6121026514868073/2251799813685248. Any set of rational numbers (except if all of them are zero) has a greatest common divisor, and importantly if the inputs all belong to the same floating-point type then the gcd will be exactly representable by the same floating-point type and it will be obtained by Euclid's algorithm (any flavor thereof).

FP division is not "unstable", whatever you mean by that. All floating point operations have precise semantics that do not depend on CPU at all (ignoring CPU bugs). Nor does it depend on compiler optimization except some flags explicitly documented as unsafe such as the appropriately named -funsafe-math-optimizations, which is also enabled by -ffast-math.

The algorithm is also very insensitive to the actual result of the division, in fact the remainder/fmod operation can be replaced by a - n * b where the choice of n only affects performance and not the actual result of the algorithm as long as n is integer and chosen to ensure a - n * b is smaller in magnitude than a.

I don't even see anything that could be broken by unsafe math optimizations other than the NaN/infinity check at the top of the function, since -funsafe-math-optimizations gives the compiler permission to just assume it will never encounter NaN or ±infinity.

avaneev commented 9 months ago

Well, 0x1p-51 is a zero for all practical purposes, if you didn't understand what I've wrote. There's no sense in pursuing reproduction of my result - I get it, but for you to get you have to use all tools that I use. Here's exact my code which I'm compiling with GCC 13.2 (x64devkit) on Windows 10, on Ryzen 3700X. Don't be puzzled - it does return 0 on my machine. Why? You decide.

#include <math.h>
#include <stdio.h>

double fgcd( double a, double b )
{
    a = fabs( a );
    b = fabs( b );

    if( a < b )
    {
        double t = a;
        a = b;
        b = t;
    }

    while( b ) {
        a = fabs( fmod( a, b ) );
        double t = a;
        a = b;
        b = t;
    }

    return a;
}

int main()
{
    printf( "%e\n", fgcd( exp(1.0),asin(1.0)));
}

avaneev commented 9 months ago

I was actually mistaken - the result I get is 4.440892e-016, which I've errorneously displayed as 0.000000. But it's different to yours anyway.

avaneev commented 9 months ago

I wasn't using -ffast-math, BTW.

avaneev commented 9 months ago

Ah, it's the same result 4.440892e-016 ~ 0x1p-51 like yours. Except anything below 1 is unusable in r8brain-free-src for obvious reasons. So your code is correct except the result is unusable since practically it's just zero, if you account for Epsilon equality tolerance. What I do not understand is that you resist that two irrational numbers of whatever precision have only one common divisor - 1. In this respect your code returns invalid result. We are talking about floating-point numbers, and not integers like you mention.

mvduin commented 9 months ago

I haven't looked at the code in detail, why exactly is anything below 1 unusable? What if someone has some non-audio data set that is genuinely sampled at 0.5 Hz and they want to interpolate it to 0.75 Hz? I don't really see why that would be a problem, that's just a straightforward 2-to-3 upsampling, and my fgcd would indeed return 0.25 as GCD and if you removed the "GCD < 1" check from getWholeStepping() it would correctly determine ResInStep=2 and ResOutStep=3.

I appreciate that GCD for rational numbers may be an unfamiliar but it is a well-defined concept in mathematics. There's a many ways to define it, some easier to understand than others:

For any integer d it's easy to see that gcd( d*a, n*d ) = abs(d)*gcd( a, b ) at least when a and b are integer but if we take this as a required property of gcd we can use it to extend gcd to rational numbers by defining gcd( a/d, b/d ) = gcd( a, b )/abs(d).
gcd( x, y ) is the largest number g such that x and y are integer multiples of g.
gcd( x, y ) is the unique non-negative number g such that x/g and y/g are coprime integers.
gcd( x, y ) is the unique non-negative number g such that x and y are integer multiples of g and g is an integer linear combination of x and y.
gcd( x, y ) is the unique non-negative number g that generates the (additive) group generated by x and y.

The last two versions are a bit more abstract and mathy but are well-defined when both arguments are zero, and return zero in that case.

All of the fgcd implementations we've examined here (including your original one, if we ignore the iteration limit) produce exact results, i.e. they simply treat their inputs as exact rational numbers and return the rational gcd defined above, none of them have "epsilon equality tolerance".

Maybe it would be useful to be able to have that, i.e. find g such that x/g and y/g are within epsilon-tolerance of being coprime integers, but you'd need to be really careful how you'd use such a result since it will probably introduce approximation errors all over the place and result in slow drift between the input and output sample grid (albeit very slow when using doubles).

Regardless, only supporting exact gcd is the safe/conservative choice in your use case, e.g. if the user asks to sample from "0.2 Hz" to "0.3 Hz" (neither of which are exact floats) then it would get a gcd of 0x1p-51 hence ResOutStep would end up being INT_MAX which is definitely greater than 1500 so it would bail out.

avaneev commented 9 months ago

I've meant that if sample rates are standard integers, any value below 1 is unusable. But of course, this limitation is not imposed as r8brain-free-src can take any sample rate specifications. I doubt anyone really uses sample rates < 1 as there are no physical sampling systems that sample at that sample rate.

0x1p-51 is basically 1, but shifted right to the lowest significant position. If, for example, double type had 1000-bit mantissa, and I've supplied 1000-bit irrational numbers, I would get something like 0x1p-1000 with your algorithm. I still insist this result is an indication that two supplied numbers have no GCD except 1. It's a error state.

mvduin commented 9 months ago

I've meant that if sample rates are standard integers, any value below 1 is unusable.

If the arguments are positive integers then you will obviously never get a value below 1 since the gcd of two integers is an integer.

But of course, this limitation is not imposed as r8brain-free-src can take any sample rate specifications. I doubt anyone really uses sample rates < 1 as there are no physical sampling systems that sample at that sample rate.

Not for audio no, but people may find use for it in other fields, there's certainly no reason to impose it as an arbitrary restriction. In the end all that matters for resampling is the ratio between input and output sample rates, not their magnitude. There's no reason for getWholeStepping() to care about how big or small the gcd is, it suffices that SSampleRate / GCD and DSampleRate / GCD are integers of sufficiently modest size.

I'll reiterate one last time: if x and y are finite floating point numbers, not both zero, then x / fgcd(x,y) and y / fgcd(x,y) are always coprime integers, regardless of how large, small, nice, or ugly x and y are (or their gcd is).

Your last paragraph is gibberish to me. You can't supply irrational numbers since all floating point numbers are rational, and a function can only operate on the actual floating point values it receives as arguments, it does not and cannot care that these floating point numbers were obtained as the floating point approximation of irrational numbers. If fgcd returns 0x1p-51 then that's because this genuinely IS the gcd of its two arguments.

The reason that using SSampleRate = exp(1.0) and DSampleRate = asin(1.0) shouldn't use whole stepping isn't because their gcd is 0x1p-51 but because the integers SSampleRate/GCD and DSampleRate/GCD are way too large, which is a completely different check (that getWholeStepping already performs).

an indication that two supplied numbers have no GCD except 1

????? non-integers can never have an integer gcd value, and more specifically the GCD of two numbers equals 1 if and only if they are coprime integers.

mvduin commented 9 months ago

Anyway, if you continue to be confused about how fgcd (or rational gcd in general) works I suggest just re-reading some of my earlier comments, I don't know what I could add to further clarify it.

But please just trust me on this (as someone with a background in mathematics), the only "error" cases fgcd has is when the arguments are NaN or infinity, for which it has an explicit check. In all other cases it gives the mathematically correct answer, with the properties listed in an earlier comment

avaneev commented 9 months ago

I do basically understand how GCD works, but I'm touching different grounds, not technicalities of computing GCD. Irrational numbers cannot have GCD - it's as I've tried to demonstrate with huge mantissa example, - and so it should be set to 1 as a solution. GCD of 0x1p-51 is a stopping criteria, not a true solution. It's "forest behind trees" example, if you understand.

avaneev commented 9 months ago

So, I assume even your algorithm needs an iteration counter - except it's practically not 52, but 30 for doubles, for Epsilon check, you should check that out.

mvduin commented 9 months ago

sighs very deeply

I've already repeated many times that fgcd is correct as-is unless you want an approximate fgcd but that is a not achieved by testing iteration count nor the gcd value itself.

The only reason to add an iteration limit would be as a r8brain-specific optimization because a high iteration count implies both x/fgcd(x,y) and y/fgcd(x,y) are large and therefore the ResOutStep > 1500 test would fail anyway. This limit can be much lower, somewhere around log2( 1500 ) ≈ 11. The exact value would require a bit of careful analysis which I can't be bothered to do right now, but setting the iteration limit to 14 should be safe I think.

avaneev commented 9 months ago

Back to the original question, I do not see a reason to use your variant, it offers no benefit. And at a larger scale, floating-point in CPU implementations are not universally standardized, so I'm wary of your variant. You may call it paranoia, but I do not want problems arising from pursuing perfection.

avaneev commented 9 months ago

Thanks for the discussion anyway. I can't remember if I ever dealt with your GCD search variant, it was a long time ago I researched this topic, but knowing it is useful.

avaneev / r8brain-free-src

gcd algorithm #27