Discussion: Division can be very slow by default due to extremely high precision

AdamWhiteHat / BigDecimal

An arbitrary-precision decimal (base 10) floating-point number class. Over 2.5 million downloads on NuGet!

MIT License

53 stars 15 forks source link

Discussion: Division can be very slow by default due to extremely high precision #54

Closed Joy-less closed 4 weeks ago

Joy-less commented 3 months ago

By default, division is very slow when the result is recurring.

This takes about half a second, which is extremely slow:

BigDecimal Result = (BigDecimal)5 / 3;

The reason is that the default precision is 5000 digits. When changed to 100 digits, it is almost instant even for 10,000 divisions:

BigDecimal.Precision = 100;
for (int i = 1; i < 10_000; i++) {
    BigDecimal Result = (BigDecimal)5 / 3;
}

According to a performance benchmark, this is the line in the division function causing the slowdown:

else if (GetSignifigantDigits(bigInteger) >= Precision) {

While this slow behaviour is technically justified, it can be a footgun when you don't know the result is recurring. I suggest decreasing the default precision, as 5000 digits is excessive.

AdamWhiteHat commented 2 months ago

I agree. I have been thinking about changing this for a while now, as 5000 digits of precision is bonkers for a default. Its likely more than anyone needs and it causes performance issues. I'm always having to change it when using the library.

I guess I was reluctant to because it would be a breaking change--users of the library might have written their code in a way that relies upon the current behavior, and changing it might break their code.

In this case, however, its hard to imagine people relying on this, as most people are probably changing it to something more reasonable.

I wish I had more insight into how people were using my library; what methods were the most used and what precision are people using. That way I would know what a more sensible default would be and focus performance improvements on those methods most used.

I wish I had this insight, but I wouldn't want to instrument my library to gather stats, as this would be a huge violation of trust and privacy, and not something most people would be expecting to find in a math library.

What I should probably do instead is create a survey in google forms or something.

Anyways, I will probably change this, and this will require a major version bump. But I want to do it after I have fixed the other issues you have raised, plus I have some other improvements that I wasa about to release that will probably go out first too. Also, if I get to making the survey right away, like by tonight, I might just wait until I get some results from that before I make this change.

Ill reply to this issue if and when I make that survey so you can submit one. But, for now, your vote of a reasonable default value for the precision parameter would be 100?

Joy-less commented 2 months ago

Ill reply to this issue if and when I make that survey so you can submit one. But, for now, your vote of a reasonable default value for the precision parameter would be 100?

Understood. Personally, I find the arbitrary size of BigDecimal to be more useful than the arbitrary precision. The precision of double is 15 digits, and I think that's reasonable. Maybe 20 to be safe. What would be more useful is if you could change the precision per-number (rather than a static property) although I understand that may be difficult to implement.

Joy-less commented 2 months ago

After some inspection, I think the best approach is to reduce the default precision and supply a custom precision as an argument to the Divide function. It looks like Pow already has a Precision argument.

Here's the Divide function:

/// <summary>Divides two BigDecimal values.</summary>
public static BigDecimal Divide(BigDecimal dividend, BigDecimal divisor)
{
    if (divisor == Zero)
    {
        throw new DivideByZeroException(nameof(divisor));
    }

    var exponentChange = dividend.Exponent - divisor.Exponent;
    var counter = 0;
    var mantissa = BigInteger.DivRem(dividend.Mantissa, divisor.Mantissa, out var remainder);
    bool firstLoop = true;
    BigInteger lastRemainder = 0;
    while (remainder != 0)
    {
        if (firstLoop)
        {
            firstLoop = false;
        }
        else if (remainder == lastRemainder)
        {
            if (GetSignifigantDigits(mantissa) >= divisor.SignifigantDigits)
            {
                break;
            }
        }
        else if (GetSignifigantDigits(mantissa) >= Precision) // <-- Make this an argument
        {
            break;
        }

        while (BigInteger.Abs(remainder) < BigInteger.Abs(divisor.Mantissa))
        {
            remainder *= 10;
            mantissa *= 10;
            counter++;
        }

        lastRemainder = remainder;
        mantissa += BigInteger.DivRem(remainder, divisor.Mantissa, out remainder);
    }

    return new BigDecimal(mantissa, exponentChange - counter);
}

I think the default precision should be around 30-40 to be slightly better than decimal. Any more is probably unnecessary.

Decimal places of π for reference: 9 (float): 3.141592653 17 (double): 3.14159265358979323 29 (decimal): 3.14159265358979323846264338327 50: 3.14159265358979323846264338327950288419716939937510 100: 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679

Protiguous commented 1 month ago

What would be more useful is if you could change the precision per-number (rather than a static property) although I understand that may be difficult to implement.

I've already tested making this change, where the precision is kept per BigDecimal. It is not difficult to implement.

adamatpinbn commented 1 month ago

change the precision per-number (rather than a static property)

Well, that sure is an interesting idea, I'm not sure what to think about that. Ill have to ponder on that idea for a bit, see if I can find any compelling reason to do that. I suspect it might be a bit unruly and difficult to reason about, if you have too many different precisions instantiated at once. I could see it being a source of bugs, or what appear to be bugs but are really just user error. I think that it probably wouldn't be following the principal of 'do the expected thing'.

It does raise some questions to which I'm not sure what the correct answer should be. Questions along the line of: If you multiply a BigDecimal that has a precision of 10 together with a BigDecimal that has a precision of 15, what precision should the result be?

Joy-less commented 1 month ago

It does raise some questions to which I'm not sure what the correct answer should be. Questions along the line of: If you multiply a BigDecimal that has a precision of 10 together with a BigDecimal that has a precision of 15, what precision should the result be?

Multiplying two BigDecimals isn't a problem, since the result is always rational (e.g. 0.1 * 0.0001 = 0.00001). The problem only occurs for recurring division (e.g. 1.0 / 3 = 0.3333333333333...). This is why I suggested the precision should be an argument to the Divide function, so you can do either of the following:

One / Two - operator overload, limit to the global Precision (which should be something like 40)
One.Divide(Two, 40) - instance function, limit to 40 precision
BigDecimal.Divide(One, Two, 40) - static function, limit to 40 precision

Feel free to suggest better ideas. This could also be done for the Multiply function.

AdamWhiteHat commented 1 month ago

Whoops--I keep accidentally commenting while logged into my work GitHub account. Anywho, I like the idea of providing overloads for the arithmetic functions allowing you to specify a precision. This way, its real obvious how to get the behavior you want out of the library but just using it, no need to read the documentation.

By the way, I just recently realized the use of BigDecimal.Precision and BigDecimal.AlwaysTruncate isn't documented in the Readme.md or the discussions. So I will probably rectify that tonight by updating the readme. Since adding the argument to divide is simple enough, I will probably get that done too.

Ill release an updated .NuGet package and mention it on this thread when I do.

I also have some uncommitted changes, changing the default precision to 200, but I will just shelve that for the moment. I realized I needed to update the documentation first.

200 might still be a bit too much based on what I'm seeing. I've searched GitHub for committed code referencing BigDecimal.Precision. While I could only find a few references setting the variable, their use seems to further support the suggestion that the default precision be "around 50":

nissl-lab/npoi - 17 PixelatedLagg/PMath - 50 Bdeering1/Mandelbrot - 20 JosGielen/Coding_Train_CSharp/CC141 Mandel_Pi - 50 lrepaty/BigDecimalCore - 10 Peidyen/ArbitraryPrecisionProject - 4 JosGielen/Coding_Train_CSharp/Varia/Chudnovsky - 2001

I'd consider the last two ones to be outliers, especially the last one, as it heavily skews the arithmetic average of 29.4 to 358.

Protiguous commented 1 month ago

I like the idea of providing overloads for the arithmetic functions allowing you to specify a precision.

I have also tested adding this optional parameter, and it works just fine.

AdamWhiteHat commented 1 month ago

Okay, I added the Division method overload. I didn't bother to build and publish a nuget package just for this change, because it would have been immediately replaced by version 3000, wherein I changed the default precision to 100.

I updated the readme to mention what BigDecimal.Precision and BigDecimal.AlwaysTruncate do, as well as put a notice that version 3000 is a breaking change from prior versions.

This is a breaking change because it changes the default behavior, and potentially people could have written code that relies upon that particular behavior. I choose to bump the version number all the way to 3000 primarily because I wanted to break the association it had with the date/year. The version number used to be the date, but that is not in line with semantic versioning, which is important to do. After I made the change to semantic versioning, the major version number still seemingly reflected the year, possibly leading to confusion. I try and not make breaking changes which bumps the major version number, so I took advantage of this breaking change to advance the version number to a number that sufficiently far enough away from the year that people should no longer make that association. We should be good for approximately 976 years or so. If we can avoid making breaking changes for that long, and/or if people are still using this library after that much time, I'd consider this a major win.

Anyways, you can get the latest NuGet Package, ver. 3000 here, which should include all your requested changes.

Now I'm off to update the Wiki documentation, and add some API documentation (generated from the XML comments) to the Wiki.