`sqrt` implementation - Githubissues

squ1dd13 commented 1 year ago

I recently found myself needing sqrt while using this crate, and after reading #60 I decided to have a go at implementing it myself.

The implementation is quite long because it's the result of quite a lot of benchmarking to try and get decent performance, so I've added a new approx module for it. The API isn't very polished yet but the actual algorithms work fairly well and are able to produce very accurate rational approximations for square roots in relatively little time. I thought I'd just get an extra opinion on it before I spend time perfecting the API, which is why I've only submitted a draft PR.

Let me know if you think this is worth finishing off. Cheers :)

squ1dd13 commented 1 year ago

Thanks for all the feedback, I'm glad you're interested :D

Since there are several points to address regarding SqrtAccuracy, I'll just reply here rather to the individual comments. The main rationale behind allowing reuse of SqrtAccuracy objects was that since the multiplier is heap-allocated, there might be some cases where performing several sqrt_* calls would be able to avoid heap-allocating a new multiplier on each call. After having another look it seems like since multiplier gets cloned anyway, this (premature) optimisation is unnecessary. I could benchmark it, but I suspect it's probably worth very little compared to other optimisations that might be possible within the main algorithm. If nothing else, this would simplify the API quite a lot.

I've made a bunch of minor tweaks (mainly internal) but I haven't done anything major to the API for now because I want to hear what you think about the above.

TL;DR I'm all for getting rid of SqrtAccuracy entirely.

squ1dd13 commented 1 year ago

For fractions I think it's probably essential to force the user to recognise that they are only working with an approximation, hence the requirement for them to supply some sort of value specifying the accuracy they want the result to have. However, would it make sense for GenericDecimal to also provide a sqrt function, but perhaps one which uses the precision type to automatically pick an accuracy level?

This would be somewhat at odds with the current documentation, which states

Calculations do not use precision

although the other part of this,

but comparison does

is actually exactly what the implementation of sqrt is doing already. We use a truncated comparison between the square of our approximation of the root and the original value we were given to determine when to stop iterating:

// Stop and yield the current guess if it's close enough to the true value.
if squared_and_truncated == truncated_target {
    break SqrtApprox::Rational(current_approx);
}

Edit: I should probably clarify that I've never been 100% sure on how to use GenericDecimal (vs. GenericFraction), so I what I said above may be complete gibberish :P

dnsl48 commented 1 year ago

@squ1dd13, I can't reply in threads to the posts in the general comment section, so I'll quote you in this reply to be more specific about things.

Since there are several points to address regarding SqrtAccuracy, I'll just reply here rather to the individual comments. The main rationale behind allowing reuse of SqrtAccuracy objects was that since the multiplier is heap-allocated, there might be some cases where performing several sqrt_* calls would be able to avoid heap-allocating a new multiplier on each call. After having another look it seems like since multiplier gets cloned anyway, this (premature) optimisation is unnecessary.

That makes a lot of sense. I agree that it sounds reasonable to pass a link to a pre-allocated instance and do not destroy it for future reuse. Also, I guess it could make sense to have a default lazy-static pre-allocated value for it as a fallback.

but I suspect it's probably worth very little compared to other optimisations that might be possible within the main algorithm. If nothing else, this would simplify the API quite a lot.

Perhaps, all possible optimisations have value regardless of their relative size. If they obscure API too much, then we may need to think about adding abstractions to simplify that (good thing Rust supports zero-cost abstractions really well :)

TL;DR I'm all for getting rid of SqrtAccuracy entirely.

I feel like that could be just a ref to GenericInteger in there.

However, would it make sense for GenericDecimal to also provide a sqrt function, but perhaps one which uses the precision type to automatically pick an accuracy level?

Decimal is implied to be a thin wrapper around Fraction, which contains extra info about how we want the fraction to be represented in a decimal format. E.g.

(1/3, precision=3) == 0.333
(1/3, precision=4) == 0.3333

The comparison takes precision into account precision to follow the "string" way of comparing numbers (e.g. 1/3 == 1/3, but 0.333 != 0.3333 while both of those might be represented as the same fraction internally).

Given that, sqrt itself is not producing a formatted representation of the decimal itself, but rather a calculated output, maybe precision is not always be the best value for accuracy for the sqrt. However, it is hard to tell without a real example. I would say, up to you if you feel that is a good default or not.

squ1dd13 commented 1 year ago

all possible optimisations have value regardless of their relative size

Fair point. In this regard it absolutely makes sense to keep it in.

I see the reasoning behind allowing any GenericInteger to be used over only specific SqrtAccuracy values, but I worry that this could lead to misunderstandings of what the value actually represents. It would be understandable for someone unfamiliar with the API to assume that they were being asked for the number of decimal places they want the result to be accurate to, which is emphatically not what SqrtAccuracy::multiplier is. The multiplier is 10^{number of decimal places}.

Consider the following example, where accuracy now takes any GenericInteger:

let my_number: DynaFraction<u32> = DynaFraction::from_str("123456/654321").unwrap();

// 0.1886780341758861476247896674567987272302
println!("{my_number:.40}");

let root_a = my_number.sqrt_with_accuracy(40);

// The user expects this to be equal to `my_number` to 40 d.p.
let root_a_squared = &root_a * &root_a;

// 0.1886780341758861476247896674567988508748
// That's only 33 d.p.!
println!("{root_a_squared:.40}");

// Here's what the user should have done:
let root_b = my_number.sqrt_with_accuracy(BigUint::from(10_u8).pow(40));

// They could also have done `my_number.sqrt(40)` with the current API.

let root_b_squared = &root_b * &root_b;

// 0.1886780341758861476247896674567987272302
// 40 d.p., as expected.
println!("{root_b_squared:.40}");

(In the above, root_b_squared is actually equal to my_number to 67 decimal places, but only 40 are guaranteed by the SqrtAccuracy passed.)

Of course it would be possible to change the API such that foo.sqrt_with_accuracy(n) would produce a result whose square* is equal to foo to n decimal places, but that would just mean computing BigUint::from(10_u8).pow(n) inside sqrt_with_accuracy, which is what SqrtAccuracy::new is already doing, but outside the method so that the same value can be reused for multiple calls. This nuance is not ideal, but it's an unfortunate side effect of how we (currently) test values to see if our approximation is close enough.

I don't like this strange behaviour – it would be much better if the accuracy referred to the actual square root value rather than the square of the square root. That is, x.sqrt(n) should give an approximation of the square root of x that is accurate to n decimal places. Currently that same expression gives an approximation whose square* is accurate to n decimal places, which is confusing and weird. With some significant maths help from a friend I think it's possible to fix this, and I'll give it a go later.

It's probably worth mentioning that there are possible uses for non-power-of-10 multipliers. In general, a multiplier of a^n means that the square of the result will be correct to at least n digits after the a-ary separator (i.e. the base-a equivalent of the decimal point) in the base-a representation of the result. For example, a multiplier of 2^40 means that if we square the approximated root and print it as a binary string, it will match the binary string representation of the original value for 40 digits after the .:

let x: DynaFraction<u32> = DynaFraction::from_str("123456/654321").unwrap();

// 0.0011000001001101001101000010001001000010
print_in_binary_to_40(&x);

let s = x.sqrt_with_accuracy(&SqrtAccuracy {
    multiplier: BigUint::from(2_u8).pow(40),
});

let sq = &s * &s;

// 0.0011000001001101001101000010001001000010
// Matches to >= 40 digits beyond the binary point.
print_in_binary_to_40(&sq);

Although I'm sure most people will only care about decimal, the fact that there is some meaning to using other bases for accuracy means there probably should be finer control given over the multiplier used.

I somehow managed to miss your previous mention of using lazy_static to preallocate certain accuracy levels – this sounds like a great idea.

At the moment I'm not sure about the idea of making accuracy an Option and choosing a default value when None is provided. I think implementing Default for SqrtAccuracy definitely makes sense, and then the user would be able to just do foo.sqrt_with_accuracy(&SqrtAccuracy::default()) (or possibly without the reference, because I think I'll change accuracy to be impl Borrow<SqrtAccuracy> so that it can be used with either). This forces the user to interact with the SqrtAccuracy type, which should hopefully make clear the fact that this is not a lossless operation, and that they must agree to some level of compromise between performance and accuracy. If we hide this behind foo.sqrt_with_accuracy(None), the user doesn't have to acknowledge that compromise. Additionally, I feel (and this is pure speculation on my part) that most users of this crate will be concerned enough about accuracy that they would be able to make a reasonable decision about the level of accuracy that they want, and we should perhaps be encouraging them to make this decision so that they can get the most out of the sqrt_* API for their specific needs.

Edit: I've made significant changes to the accuracy stuff; see the commit immediately after this message.

Thanks for your clear explanation of the decimal API – it makes a lot more sense to me now. I think it's probably best not to use the precision to pick a default accuracy, so I guess the decimal approximations API will most likely be (near) identical to the fraction one.

dnsl48 commented 1 year ago

Hi @squ1dd13,

Thank you for the detailed explanation. Yes, agree, having SqrtAccuracy type makes sense. Also agree that we can have ::default instead of None. My thinking about the Option was to use a _lazystatic value in case of None instead of allocating a new value for each call. Perhaps, we could achieve something similar within the SqrtAccuracy internal logic. Not sure if that's worth pursuing though. I wouldn't mind accepting the patch without that optimisation.

I also feel that might be important to capture the details from your above post in the documentation for the SqrtAccuracy structure and the sqrt function.

squ1dd13 commented 1 year ago

My thinking about the Option was to use a lazy_static value in case of None instead of allocating a new value for each call. Perhaps, we could achieve something similar within the SqrtAccuracy internal logic.

Is this not covered by the behaviour of the Accuracy type? When lazy_static is enabled, the user can make use of the Dp20, Dp100 and Dp500 variants, which do not store a BigUint and instead use pre-allocated lazy_static BigUint values:

/// Returns a reference to the multiplier used by `self` to chop off irrelevant digits.
pub fn multiplier(&self) -> &BigUint {
    #[cfg(feature = "lazy_static")]
    {
        lazy_static! {
            static ref DP20_MUL: BigUint = BigUint::from(10_u8).pow(20_u32);
            static ref DP100_MUL: BigUint = BigUint::from(10_u8).pow(100_u32);
            static ref DP500_MUL: BigUint = BigUint::from(10_u8).pow(500_u32);
        };

        return match self {
            Accuracy::Dp20 => &DP20_MUL,
            Accuracy::Dp100 => &DP100_MUL,
            Accuracy::Dp500 => &DP500_MUL,
            Accuracy::Custom { multiplier } => multiplier,
        };
    }

    // -snip-

    let Accuracy::Custom { multiplier } = self else {
        // -snip-
    };

    multiplier
}

The current Default implementation for Accuracy is just Dp100, so if the user passes Accuracy::default(), there will be no allocation. If lazy_static is disabled, only the Accuracy::Custom variant is available[^1]. (This is better than leaving the DpX variants enabled and adapting multiplier to allocate a value for them on-the-fly, because ::Custom.multiplier() doesn't need to allocate, but the DpX variants don't store a BigUint inline so without lazy_static they will always have to allocate a new one when it is needed.)

[^1]: This means Accuracy::default needs lazy_static at the moment. It might be more ergonomic to just change default so that it allocates a new Accuracy::Custom when lazy_static is off, but I'm not sure how comfortable I feel with hiding a heap allocation behind a default call.

I also feel that might be important to capture the details from your above post in the documentation for the SqrtAccuracy structure and the sqrt function.

Absolutely. I'll write up full documentation for each of the sqrt* methods and include this. As SqrtAccuracy has become the more general Accuracy, I think the specific way in which the Accuracy value passed is used should be up to the individual approximations implemented in the module, and not the documentation of the Accuracy type itself. (At the moment the only approximation is sqrt, but Newton's method could be used for fractional pow amongst other things. The reason I've only implemented sqrt so far is because it's ubiquitous enough that it makes sense to implement it by itself to take advantage of the specific optimisations that it allows. That is, halve_in_place_pos_rational and any other specialised stuff.)

dnsl48 commented 1 year ago

Is this not covered by the behaviour of the Accuracy type? When lazy_static is enabled, the user can make use of the Dp20, Dp100 and Dp500 variants,

Yes, that looks good. I was writing that comment before reviewing the latest changes to the PR, sorry for the confusion.

Accuracy::default needs lazy_static at the moment

Totally happy with that. There are no strong arguments for not having lazy_static as a dependency. We also have it now required by with-bigint feature anyway. And bigint is a hard requirement for the approx module. Thus, lazy_static is an indirect requirement of approx anyway.

squ1dd13 commented 1 year ago

There are no strong arguments for not having lazy_static as a dependency. We also have it now required by with-bigint feature anyway. And bigint is a hard requirement for the approx module. Thus, lazy_static is an indirect requirement of approx anyway.

I keep forgetting this :P I'll leave the explicit requirement for lazy_static on those variants in case we end up changing things to make with-bigint optional for approx in the future.

I think the PR is nearly done. I'll write some more tests so that we have all the special cases covered, and then write up the documentation for all of the methods. Please do say if there's anything that the docs leave unclear – I'm used to the inner workings of the module now so there's a chance I'll miss out bits that I forget might not be obvious.

Not directly related to this PR as such but still potentially important is that there are a couple of issues I noticed with GenericFraction<T> allowing signed T. I think the best summary of this is given by the implementation of sqrt_with_accuracy_raw as of yesterday, but in short: stuff like GenericFraction::<T>::signum, ::is_sign_positive, ::is_sign_negative, etc. cannot be trusted for signed T.

(On looking a second time, it would appear that this is not an issue for fractions made using new_generic etc. because there's code that would explicitly remove the signs from the num/denom. Since GenericFraction can be created directly from a Ratio this isn't a non-issue, but it's not as bad as I thought.)

squ1dd13 commented 1 year ago

That's everything I can think of for now. I've made sqrt its own module under approx so that it has module-level documentation where all of the concepts (the whole accuracy thing, raw, with_accuracy, abs) can be discussed in one place. This makes documenting the individual methods much easier.

I plan to open at least one more PR with other approximations, so it would make sense for those to then be their own modules within approx. Accuracy is a child of approx, so it would be available to all of the approximation modules.

dnsl48 commented 1 year ago

Not directly related to this PR as such but still potentially important is that there are a couple of issues I noticed with GenericFraction allowing signed T. I think the best summary of this is given by the implementation of sqrt_with_accuracy_raw as of yesterday, but in short: stuff like GenericFraction::::signum, ::is_sign_positive, ::is_sign_negative, etc. cannot be trusted for signed T.

This is true, the negative numbers within the Ratio data (either num or denum) would lead to destructive consequences in many scenarios. Partially, this is neglected because the GenericFraction implementation keeps the sign out as a separate attribute: enum ... Rational(Sign, Ratio<T>). Because of that, keeping signs within the ratio becomes redundant and unnecessary. However, in some edge cases we can't have static validation for the code to not have negative numbers in case T is signed. That's unfortunate, but removing support for signed T feels like an overkill, which would only tackle hypothetical problems where developers control the object construction manually (thus new + new_neg, new_raw + new_raw_signed methods allowing the sign control). Perhaps, we could improve some APIs there to make it transparently safer with negative numbers.

unsigned T is considered to be the main scenario and this is not a problem
signed T is only supported for flexibility, but also safe if users control sign on construction (choosing correct new/new_neg method)
Perhaps, we could implement extra logic in the constructor that would add sign safety for signed types only (and eliminated on compilation with unsigned T). That feels like a good Hacktoberfest project :)

squ1dd13 commented 1 year ago

That running time had me confused for a minute :D I forgot to say at any point that the algorithm is only remotely "fast" when compiler optimisations are turned on, so I've been testing with --release. Indeed it takes around 220s to run cargo test fraction::approx --features="with-approx" on my machine; compare this to ~12s to run the same with --release enabled[^1].

[^1]: Out of interest, I also tried testing with RUSTFLAGS="-C target-cpu=native", unrealistic as that may be. It reduces the time further down to just under 9s, which I think is just incredible. We're calculating five different square roots to 10, 100, 1k, 10k and 100k decimal places, using numbers represented as Vecs, and it's capable of executing in that time. rustc is a fantastic thing...

This difference is absolutely not ideal, because it's likely to catch users out. RFC 2412 should help with this, because we would be able to force speed optimisation for specifically the sqrt module when the user doesn't supply any other optimisation parameters. For now, I guess the only solution for this aspect of the problem is to add a note to the documentation.

In terms of the testing time, would we be able to test with optimisations enabled? All we'd need is this:

[profile.test]
opt-level = 3

Alternatively, we could test using --release, but that seems unnecessary and also has the issue of disabling debug assertions etc.

For now I'll add a note to the documentation and change the build config to optimise tests, but I'm open to reversing those changes and discussing other options.

squ1dd13 commented 1 year ago

Oops, forgot to check that. I've left the benches in but they'll now only run when with-approx is enabled.

dnsl48 commented 1 year ago

Released with 0.14.0

squ1dd13 commented 1 year ago

Great, thanks!

dnsl48 / fraction

`sqrt` implementation #84