Rinzii / ccmath

A C++17 Compile Time <cmath> Library
Other
44 stars 9 forks source link

Implement Trigonometric Module #10

Open Rinzii opened 8 months ago

Rinzii commented 8 months ago

The trigonometric module currently has the following functions that need implementation or work done.

TODO:

Implement:

Document:

NOTES:

None currently.

Yikai-Liao commented 4 months ago

I've been looking at calculating trigonometric functions using Chebyshev polynomials recently, and I can write the general implementation of the trigonometric section for ccmath if you wish.

This is really the dominant way of calculating trigonometric functions that I've seen, but it seems to be some way from half the performance of cmath with guaranteed accuracy.

Some libraries may only use Chebyshev polynomials of degree 5 in the interest of performance (faster than cmath), but the error will be of the order of 1e-5.

For specific accuracy, you can refer to this article.

Well, one fessible optimization is to make approximations with different degrees for different data types.

Yikai-Liao commented 4 months ago

We can use lolremez to compute Chebyshev polynomial parameters for different functions.

Yikai-Liao commented 4 months ago

I did further testing and found that if only [0~pi/4] is approximated, sufficient accuracy can be achieved with fewer degress. Functions for other intervals can be obtained with some simple transformations.

image

Rinzii commented 4 months ago

Hello! Firstly I want to thank you for your interest in contributing to ccmath!

I've been looking at calculating trigonometric functions using Chebyshev polynomials recently, and I can write the general implementation of the trigonometric section for ccmath if you wish.

This is really the dominant way of calculating trigonometric functions that I've seen, but it seems to be some way from half the performance of cmath with guaranteed accuracy.

Some libraries may only use Chebyshev polynomials of degree 5 in the interest of performance (faster than cmath), but the error will be of the order of 1e-5.

For specific accuracy, you can refer to this article.

Well, one fessible optimization is to make approximations with different degrees for different data types.

Yeah generally the approach I've seen the most is to generate a multi degree polynomial approximation using minimax with tools like Sollya and with the actual evaluation commonly using something like the Horner's Scheme. Still These error values seems somewhat in line with an acceptable degree. Generally my primary goal is accuracy before speed (within reason of course. We should still be aiming to be close to the same speed as the standard if possible) so ideally you'd want to focus on being precise.

As for different optimization paths. I'd say using specific LUT and approximations based on data type would be best as generally speaking it should give us the best speeds and accuracy. This is also what I've done with some of the other exponentiation functions.

Now I do have a few questions:

Rinzii commented 4 months ago

We can use lolremez to compute Chebyshev polynomial parameters for different functions.

Fascinating tool! I've mostly been using Sollya for generating many of my LUT but I'll look into this tool further! ^^

Rinzii commented 4 months ago

I did further testing and found that if only [0~pi/4] is approximated, sufficient accuracy can be achieved with fewer degress. Functions for other intervals can be obtained with some simple transformations.

Fantastic! I think that would be a good edge case to check for to instead use a faster approximation algorithm if possible.

Yikai-Liao commented 4 months ago

I'm sorry, but I haven't actually had much exposure to the maths library implementation before, so I don't really understand what you mean by "handle all rounding modes" and "fenv exceptions".

I recently started looking in this direction because using the gcem library I found its runtime performance to be quite low.

I agree with you that we should keep the precision as high as possible, and I'm inclined to go with increasing the degree of the polynomial until the precision can't continue to increase within the constraints of the data type.

One thing I've observed though is that it seems to be because of the limited precision of pi, despite being able to achieve a high degree of precision in [0, 2pi], my implementation of sin and cos still suffers from a drop in precision when dealing with large numbers (e.g. when the inputs reach the magnitude of 1e5, the error rises to 1e-12)

Here is my current simple implementation:

constexpr double pi = 3.1415926535897932384626433832795028841972;
constexpr double quarter_pi = pi / 4;
constexpr double half_pi = pi / 2;
constexpr double double_pi = pi * 2;

double sin_core(const double x1) {
    // [0, pi/4]
    // sin(x) = x * (1 + x^2 * f(x^2))
    // clang-format off
    auto f = [](double x) { return ((((1.588538871176777e-10*x-2.5050560125830959e-8)*x+2.7557312421716393e-6)*x-1.9841269826137034e-4)*x+8.3333333333178271e-3)*x-1.6666666666666614e-1;};
    auto x2 = x1 * x1;
    return x1 * (1 + x2 * f(x2));
    // clang-format on
}

double cos_core(const double x1) {
    // [0, pi/4]
    // cos(x) = 1 + x^2 * f(x^2)
    // clang-format off
    auto f = [](double x) { return (((((-1.134539022348612e-11*x+2.0875418530567383e-9)*x-2.7557311795652509e-7)*x+2.4801587278862693e-5)*x-1.3888888888851575e-3)*x+4.1666666666666375e-2)*x-4.9999999999999999e-1;};
    auto x2 = x1 * x1;
    return 1 + x2 * f(x2);
    // clang-format on
}   // namespace mmath

double sin_inner(double x) {
    // [0, pi]
    if (x <= quarter_pi) return sin_core(x);
    if (x <= half_pi) return cos_core(half_pi - x);
    x = GCEM_PI - x;
    if (x <= quarter_pi) return sin_core(x);
    return cos_core(half_pi - x);
}

double cos_inner(double x) {
    // [0, pi]
    if (x <= quarter_pi) return cos_core(x);
    if (x <= half_pi) return sin_core(half_pi - x);
    x = GCEM_PI - x;
    if (x <= quarter_pi) return -cos_core(x);
    return -sin_core(half_pi - x);
}

double sin(double x) {
    // all real numbers
    // try using fmod in ccmath
    x -= std::floor(x / double_pi) * double_pi;
    if (x <= GCEM_PI) return sin_inner(x);
    return -sin_inner(x - static_cast<double>(GCEM_PI));
}

double cos(double x) {
    // all real numbers
    // try using fmod in ccmath
    x -= std::floor(x / double_pi) * double_pi;
    if (x <= GCEM_PI) return cos_inner(x);
    return -cos_inner(x - static_cast<double>(GCEM_PI));
}

And here is the instructions I used to generate the params:

./lolremez --degree 5 --range "1e-50:pi*pi/15" "(sin(sqrt(x))-sqrt(x))/(x*sqrt(x))" "sin(sqrt(x))/(x*sqrt(x))"

./lolremez --degree 6 --range "1e-50:pi*pi/15" "(cos(sqrt(x))-1)/x" "cos(sqrt(x))/x"

Although I don't know much about some of the issues you're talking about, I hope this information I've provided is helpful.

Rinzii commented 4 months ago

One thing I've observed though is that it seems to be because of the limited precision of pi, despite being able to achieve a high degree of precision in [0, 2pi], my implementation of sin and cos still suffers from a drop in precision when dealing with large numbers (e.g. when the inputs reach the magnitude of 1e5, the error rises to 1e-12)

You absolutely will have to use a higher precision data type than the bast type. So float will require double, and double will likely require double-double or float128 to be able to properly offset the loss of precision you are observing. For float the issue is not as bad, but for double things bet much trickier.

When it comes to sin and cos they are pretty tricky functions. Likely what will be required is a mixture of a few different steps. Firstly, you will likely have to perform a range reduction to get the algorithm within a correct interval such that we evaluate our results correctly and efficiently

You can find more info about this within this book:

J.-M. Muller, Elementary Functions, DOI 10.1007/978-1-4899-7983-4_11, Chapter 11 - Range Reduction, Pg 199.

Other steps that will likely be required is high degree polynomial approximations that are with the optimal interval of the specified degree. This will likely require some tweaking and experimenting to get the approximation correct but shouldn't take to long.

I'm sorry, but I haven't actually had much exposure to the maths library implementation before, so I don't really understand what you mean by "handle all rounding modes" and "fenv exceptions".

That's absolutely fine! I've learned quite a bit from working on this library and many of the different considerations these functions have and working on any of these functions will also teach you a lot!

For considerations of rounding modes the default rounding mode is set to round-to-nearest. This is the classic form of rounding where 1.4 would round down to 1.0 and 1.5 would round up to 2.0 in the simplest sense. Now there is multiple different forms of rounding mode and even though we don't explicitly require that all rounding modes are handled by the library ideally I would like all rounding modes handled if possible to ensure we are s correct of an implementation of the standard as possible. Of course if this is to much I will be fine with just round-to-nearest. For more information on rounding modes you can read up on them here:

https://www.gnu.org/software/libc/manual/html_node/Rounding.html https://en.cppreference.com/w/cpp/numeric/fenv/FE_round

Also here is an example of us correctly handling all rounding modes inside of the library: https://github.com/Rinzii/ccmath/blob/dev/include/ccmath/internal/generic/functions/power/sqrt_gen.hpp

As for fenv for specific edge cases during runtime we may have to raise an exception to the floating point environment. Internally we have helper functions for handling this so all you really have to worry about is calling the functions at the correct time. Here is more info on this:

https://en.cppreference.com/w/cpp/numeric/fenv

Also std::sin defines all of the sections when we would have to raise these exceptions here:

https://en.cppreference.com/w/cpp/numeric/math/sin

Here is my current simple implementation:

This is a good starting point! Now things I would consider is moving some of these values into bits to speed up evaluation and allow for more precise control of your evaluations. Inside of the internal/support/fp headers we have some helper classes specifically for making bit manipulation easier such as our FPBits support class. ^^

Yikai-Liao commented 4 months ago

Well, maybe we could copy the implementation in musl (MIT) direcetly. Their implementation is very clean, and well tested.

Rinzii commented 4 months ago

Well, maybe we could copy the implementation in musl (MIT) direcetly. Their implementation is very clean, and well tested.

That is certainty an option though looking over the implementations inside of musl I can see based on the code that augmenting there implementations to work inside of a constexpr context would not be a trivial task as I'd require a bunch of changes to there approach to allow our version to work at compile time though augmenting this would not be impossible just a lot of busy work. Another item worth pointing out also is musl's implementation also only works for the default rounding mode. This is generally fine though not ideal. I'm though not against the idea of reworking there implementation to work with ccmath along with providing correct credit.