Benchmark against TA-lib inaccurate?

s9v commented 4 years ago

Is it possible that TA-lib performs worse in the benchmark (benchmark.c) only because it's linked dynamically?

I don't know how much overhead is caused by dynamic linking, but I tried this:

replaced ti_mom inside indicators/mom.c with TA_MOM from ta-lib/src/ta_func/ta_MOM.c
changed the signature of TA_MOM to match with ti_mom's
rebuilt and re-ran the benchmark

Here's the benchmark output before the changes:

Benchmark          mom             26ms  2261mfps  // <-- original TI mom
Benchmark          MOM-talib       31ms  1896mfps  // (dynamically linked TA mom)

Here's the output after I replaced TI code with TA-lib code:

Benchmark          mom             26ms  2261mfps  // <-- "statically linked" TA-lib mom
Benchmark          MOM-talib       32ms  1837mfps  // (dynamically linked TA mom)

The results are suspiciously similar, I know. But I repeated the above several times (introducing compilation errors to make sure I'm actually running the new code) and there seems to be no performance difference between TI and TA-lib implementations of MOM.

codeplea commented 4 years ago

That's interesting.

Could you repeat your test using a more complicated function, such as linear regression (TI:linreg, TA-Lib:LINEARREG)?

For the record, the benchmark results posted at https://tulipindicators.org/benchmark use static linking for both TI and TA-Lib. Dynamic linking is not used. Even so, dynamic linking should not cause any significant performance penalty after the code is loaded into memory.

I think the term you're thinking of is "Dynamic dispatch", not dynamic linking. In that case, your change doesn't make TA-Lib use "static dispatch", rather it makes it use TI's "dynamic dispatch".

I specifically designed TI's dynamic dispatch interface with speed and simplicity in mind, so it's no surprise that it out-preforms TA-Lib's dynamic dispatch.

codeplea commented 4 years ago

Just to dive in a little deeper...

TI's mom code is:

    for (i = period; i < size; ++i) {
        *output++ = input[i] - input[i-period];
    }

TA-Lib's mom code is:

   while( inIdx <= endIdx )
      outReal[outIdx++] = inReal[inIdx++] - inReal[trailingIdx++];

So TI uses a pointer and one counter, where as TA-Lib uses three counters.

We can easily change TI to use the same methods as TA-Lib, in which case its code would look like this:

    int outIdx = 0, inIdx = period, trailIdx = 0;
    while (inIdx < size) {
        output[outIdx++] = input[inIdx++] - input[trailIdx++];
    }

Making that change seems to make TI about 15% slower that the original TI code. However, the TA-Lib benchmark is about 50% slower. So I would conclude that TI's mom implementation is slightly faster, and TI's interface is much faster, when compare to TA-Lib.

Here are my numbers:

TI mom original: 2261mfps TI mom modified: 2027mfps TA-Lib mom: 1367mfps

In any case, I don't find these micro benchmarks on such simple functions to be very interesting. TA-Lib and TI are both very fast for mom. That's because mom is so simple.

I would like to see your thoughts after comparing a more complicated indicator, e.g. the linear regression indicator.

TulipCharts / tulipindicators

Benchmark against TA-lib inaccurate? #91