wfpokorny commented 7 years ago

http://bugs.povray.org/task/272 (As posted by Simon (infoised))

Details:

While familiarizing myself with the code, I found some small changes in the solve_cubic function that lead to a significant speedup.

In my experience, "pow" is by far the slowest function in math.h and replacing it with simpler functions usually makes a tremendous impact on the speed (it's an order of magnitude slower than sqrt/exp/cbrt/log).

solve_cubic has a "pow" function that can be replaced by cbrt (cubic root), which is standard in ISO-C99 and should be available on all systems. Separate benchmarks of solve_cubic function show this change almost doubles the speed and does not lower the accuracy. As solve_cubic is part of the solution of quartic equation, this improves the speed for many primitives. Testing with a scene containing many torus intersection tests (attached below) I still observed almost 10% speedup (Intel, 4 threads, 2 hyperthreaded cores, antialiasing on, 600×600: from 91 to 84 seconds). And this is for a torus, where a lot of time is spent in the solve_quartic and cubic solver is only called once! Similar speedup should be expected for prism, ovus, sor and blob.

I do believe the cubic solver can be done without trigonometry, but that would mean changing the algorithm, introducing new bugs and requiring a lot of testing. However, the trigonometric evaluation can still be simplified (3% speedup in full torus benchmark).

These changes don't affect the algorithm at all, they are mathematically identical to the existing code, so the changes can be applied immediately. I also included other changes just as suggestions. Every change is commented and marked with [SC 2.2013].

This sadly does not speedup the sturm solver, which uses bisection and regula-falsi and looks very optimized already.

The test scene I used has a lot of torus intersections from various directions (shadow rays, main rays, transmitted rays).

polysolv.cpp (34.5 KiB) (Later version in zip file) test_torus.pov (0.7 KiB)

See zip: FS272.zip

Comments:

Comment by Christoph Lipka (clipka) - Saturday, 23 February 2013, 04:54 GMT+5

Thanks for your investigation.

Unfortunately, while cbrt is indeed C99 standard, it is not C++03 standard, and is indeed unavailable on some C++ compilers (such as MS Visual C++ 2010, which is one of our main target compilers).

Comment by Simon (infoised) - Saturday, 23 February 2013, 06:10 GMT+5

I understand.

It may be worth checking for its existence in ./configure (there are standard macros for that) and have a fallback like

ifndef HAVE_CBRT_SET

inline DBL cbrt(DBL x){ return pow(x,1/3.0); }

endif

I used this exact same trick in another project, with a configure line AC_CHECK_FUNC([cbrt],AC_DEFINE([HAVE_CBRT_SET],[1],[c99 cbrt function is available.]))

The trigonometric changes work in any case (but the speedup is smaller).

Comment by Christoph Lipka (clipka) - Saturday, 23 February 2013, 12:44 GMT+5

I'm putting this on my to-do list for version 3.7.1, as we've already frozen 3.7.0 except for bugfixes. The good news is that this opens up the possibility to do a more radical change of the algorithm, so if you feel like trying to get rid of the trig functions your help would be greatly appreciated.

Comment by Simon (infoised) - Sunday, 24 February 2013, 03:06 GMT+5

Well technically, trigs are unavoidable if you want a closed-form solution. For double precision, I thought some iterative formulas (like 2 steps of Newton method from a reasonable initial approximation) would be quicker, but I tested some options and they are at approximately the same speed (and they are also less readable, which we don't want in this code). As we only need 3 trig functions (acos, cos and sin), it's almost as good as you can go. The main slowdown really comes from pow() in the case of a single real root.

However, I found that we can split the speed of sturm algorithm in half. The algorithm itself is ok, but the initial bracket (0,MAX_DISTANCE) is horrible - you need around 20 bisections (and regula-falsi suffers as well) to get from MAX_DISTANCE to normal distances of order 1 to 100. There are simple formulas that set the upper bound on roots, and the simplest one that I tried immediately improved the speed. On my test scene, it went from 233s to 119s.

I will keep investigating the bounding algorithm to see if I can further improve the performance, and test if it breaks anything (it shouldn't, but just in case). Ideally, the sturm solver should be quick enough to completely replace the quartic formula, we just need to be smart.

I know that this part of povray code is very old and does not rely on external libraries. As polynomial solvers are available in standard open libraries (such as boost), we could consider using them in the far future. polysolv.cpp (35.2 KiB)

Comment by Christoph Lipka (clipka) - Sunday, 24 February 2013, 03:41 GMT+5

However, I found that we can split the speed of sturm algorithm in half. The algorithm itself is ok, but the initial bracket (0,MAX_DISTANCE) is horrible - you need around 20 bisections (and regula-falsi suffers as well) to get from MAX_DISTANCE to normal distances of order 1 to 100. There are simple formulas that set the upper bound on roots, and the simplest one that I tried immediately improved the speed. On my test scene, it went from 233s to 119s.

That's good news indeed.

I will keep investigating the bounding algorithm to see if I can further improve the performance, and test if it breaks anything (it shouldn't, but just in case). Ideally, the sturm solver should be quick enough to completely replace the quartic formula, we just need to be smart.

Do you have numbers for a comparison of quartic solver vs. your improvement to the sturm solver?

I know that this part of povray code is very old and does not rely on external libraries. As polynomial solvers are available in standard open libraries (such as boost), we could consider using them in the far future.

Probably, yes. Using a well-established external library is prone to get us code that is better tested, better optimized, and better analyzed with respect to precision issues, and we're already using part of boost anyway.

Comment by Simon (infoised) - Sunday, 24 February 2013, 09:33 GMT+5

Numbers:

In torus scene: old sturm: 233s new sturm: 116s quartic: 83s

1000000 randomly generated polynomials of fourth order with coefficients between -50 and 50 (with symmetric initial bracket, so it finds all of them, not just the positive - for better comparison): old sturm: 3.205s new sturm: 0.830s quartic: 0.339s

This test is not entirely realistic: in random polynomials, coefficients are more unpredictable and it is common to have no roots at all (average of 0.4 real roots per polynomial). In reality, you get less cases without real roots.

c-lipka commented 7 years ago

I know that this part of povray code is very old and does not rely on external libraries. As polynomial solvers are available in standard open libraries (such as boost), we could consider using them in the far future.

I've had a closer look at boost now, and can't find anything close to the polynomial solver we'd need, i.e. an algorithm to find all non-complex roots of a given polynomial. It seems that all boost provides is a couple of algorithms to find arbitrary individual root of a generic function. Am I missing something?

The GNU Scientific Library (GSL) does provide a proper polynomial solver, but it is designed to find all complex roots, which might come at a performance penalty.

c-lipka commented 7 years ago

Since I don't see anywhere near the quoted speed increase from any of the proposed changes in the Windows version (using VS2015 as the compiler), I lean towards postponing the changes to a later version.

wfpokorny commented 7 years ago

As an amateur here I've done some digging and mostly agree with what you found with respect to the solvers. Up until 2016 there had been some theoretical work by Michael Sagraloff, et al, that sounded promising.

See: http://dblp.uni-trier.de/pers/hd/s/Sagraloff:Michael and if you look at the couple 2016 paper abstracts (I've read two earlier papers, but only the abstract for these two) with "and now for real" in the title. The work is perhaps no longer just theoretical... He works at Max-Planck-Institute for Informatics & I'm unsure whether this means the ideas are something we might be able to use.

As for the lessor performance gain, had the thought this morning a recent change was made in the polynomial solver first due static code analysis and then a later correction to that change. Might this change make a difference in the efficacy of Simon's changes...

wfpokorny commented 6 years ago

Solver updates mostly related to accuracy in this branch:

https://github.com/wfpokorny/povray/tree/fix/polynomialsolverAccuracy

now include a commit with two upper bound, Cauchy based estimation techniques. Very nice performance gains in initial testing. Further discussion in the newsgroups thread:

http://news.povray.org/povray.binaries.images/thread/%3C5ae324d7%241%40news.povray.org%3E/

POV-Ray / povray

Port of FS272 - Minor change, significant speedup in cubic polynomial solver #236

ifndef HAVE_CBRT_SET

endif