optimization with -O3 or -Ofast - does anyone have experience?

nick87720z commented 6 years ago

Just now i noticed, that libqalculate, which i built with debug to learn, how some things work, is still in this state. As another radical try - i'm trying now to build it with -Ofast (sic) flag.

I already have seem recommendations to try, when possible, use -O3 for some pro-audio software, especially various effect plugins, which do heavy calculations. I guessed, that same might be true to any other heavy-duty math tools. Of course it is obvious, that just libqalc is not only thing to rebuild in this case, as works on top of gmp/mpfr.

So, does anyone have some experience trying to super-optimize all of that?

I'm working on qalculate plugin for gnuplot, which makes qalculate guided by gnuplot itself (basically there are functions to load function expressions into qalculate and run many times with argument, good to use with plot command).

nick87720z commented 6 years ago

Since getting formula pre-parsing in gnuplot plugin i made a few benchmarks, trying to rebuild libqalculate, gmp, mpfr and with -Ofast. On my system (cpu intel b950) test gnuplot program took almost same time, as with -O2 (though it is still 2 times faster then -O0).

In gnuplot i tried with samples=20000, plotting 4 formulas at once: x^2/3, 1/x, x/4-3, sin(x rad)

Time with only libqalculate rebuilt with different flags: -O0: 4.6..4.8 -O2: 2.35..2.45 -Ofast: 2.29..2.4 Rebuilding of gmp, mpfr and again libqalculate with -Ofast placed time to range ≈2.31..2.417 .

These fomulas are used for test, included with plugin, benchmark may be ran by just: $ time make test

hanna-kn commented 6 years ago

I tested the same expressions but with MathStructure::generateVector(), from 1 to 100 with 20 000 samples, and -O2. It took between 0.41 and 0.48 seconds (0.05..0.12 s when only creating Calculator object and loading definitions). Much of the time difference compared to your test is probably due to the slower cpu (I'm using i7-8700).

qalc "plot([x^2/3;1/x;x/4-3;sin(x)]; 1; 10; 20000)" takes approximately one second.

Using Number directly takes between 0.11 and 0.17 s.

The formulas sin() is 35% faster when setting EvluationOptions::approximation to APPROXIMATION_APPROXIMATE (recommended in your case).

I have now made the trigonometric functions faster by making the angle unit handling more efficient, primarily with APPROXIMATION_TRY_EXACT.

hanna-kn commented 6 years ago

Preevaluating the symbolic expressions saves another 15% (potentially much more for complex expressions). But I would then advice setting EvaluationOptions::expand to zero to avoid long polynomials.

I have almost managed to half the time required for my test.

I do not recommend using -Ofast (which turns on -ffast-math).

nick87720z commented 6 years ago

I already tried ... not eval at symbolic expression, which is already almost ready after generate - but factorize(), followed by simplify(). Now full test sequence takes 1.95..2.1, in very rare cases may take 2.4 (before that it was 2.29..2.35)... almost 15% :) .

For different approximation ways - i did not get any noticable difference. I even tried higher samples value, than 20000 - where it took nearly 10s, still no difference. Even if it presents, it is lower than total spread between results for different runs, made with same build and samples. Also tried CALCULATOR's precision, which is default 8, set to 64, 256, 512 (probably, this is where i got 10s results, i don't not sure, can't remember), even 2 and 1 (!!!) - when it was close to 1, i expected output to be rounded up to integers, but that was not happen (it seems, precision affects only print() output).

hanna-kn commented 6 years ago

Using factorize() on an expression that has not been evaluated is not recommended, although both functions will by default do some evaluation first. simplify() might reverse what factorize() has done.

Internal precision is considerably higher than the selected precision, primarily to handle precision loss during calculation when interval arithmetic is disabled (which it probably ought to be in your case).

APPROXIMATION_TRY_EXACT will primarily cause slow-downs with more complex expressions and/or very large or small rational numbers (e.g. "x^200000" will be around 100 times slower for x=1..10). In some cases every inexact calculation will be done twice. If floating point numbers is used directly (using Number::setFloat() or using the MathStructure initializer taking a double) the difference will probably be smaller.

nick87720z commented 6 years ago

Hm. I never tried interval arthmetics in qalc enough, to notice difference (just experimented with result display forms in qalculate-gtk).

I used sequence of factorize+simplify in order to find better optimization ways. I noticed deprecated structuring mode HYBRID, whose description tells to use SIMPLIFY instead. Does it mean, that just SIMPLIFY itself involves factorization as well, to better search for more optimal forms? Or this is useless in case of qalculate?

Update: Looks like lone eval() does it job better, that pair of factorize+simplify(). After replacing this to single eval, test time is always 1.95<t<2 :)

Update: After i enable interval arithmetics, time result become almost same as when i used factorize+simplify. After useIntervalArithmetics() result become as before with unoptimized symbolic expression.

One trouble, which i can't beat: math struct, created by UserFunction->calculate, doesn't have angle unit added in trigometric functions (i did not forget to give correct eo to this call).

hanna-kn commented 6 years ago

One trouble, which i can't beat: math struct, created by UserFunction->calculate, doesn't have angle unit added in trigometric functions (i did not forget to give correct eo to this call).

The reason was that UserFunction intentionally uses the default parse options, and not from the supplied evaluation options, to parse the expression.

For user-defined (not defined in the distributed functions.xml) UserFunction objects, this is probably a bit confusing and a possible solution might be to parse and print the expression before setting the UserFunction expression.

In your case I am however not convinced that it is not better to parse the expression directly, using Calculator::parse() (or why not Calculator::calculate() to do both the parsing and initial evaluation at the same time), instead of using UserFunction.

I have fixed the angle unit issue in a recent commit.

nick87720z commented 6 years ago

Hm, indeed. In documentation UserFunction placeholders for formula are told to be simple backslashed letters. However, if there are some leters, after xyz, before which is some gap in common letter range (i.e., instead of xyzab it is xzb), those letters are not substituted from vargs in calculate() call, instead some internal default is used (zero?). However, those, which are recognized, look double quoted in result's print() output. By using d-quotes to initialize UserFunction or setFormula, it is possible to have any letters sequence, probably even word-names.

I almost got multiple arguments support for plugin... if only gnuplot allowed to parse variadic arguments or at least arrays as function parameters... there was discussion around, about gnuplot 5.1.0, which yet had to come, which should support arrays, but current implementation in 5.2.2 can't be used as argument :/

nick87720z commented 6 years ago

Nice, yet good to see a performance boost (now takes down to 1.7).

Qalculate / libqalculate

optimization with -O3 or -Ofast - does anyone have experience? #64