BYVoid / TA-Lib

Technical Analysis Library (Java Maven mirror)
http://ta-lib.org/
Other
226 stars 75 forks source link

Why do my results seem to be inaccurate? #2

Closed ig-dev closed 4 years ago

ig-dev commented 6 years ago

Edit: This was previously a false warning to the community not to use this library - I apologise for the false alarm because of my own unflattering misunderstanding. Turns out that I had made wrong assumptions about how the library processes input and returns values.

As long as you understand that

You should be fine :slightly_smiling_face:

ig-dev commented 6 years ago

Let me given an example to show that this library cannot even calculate an Exponential Moving Average - one of the most basic averages.

import com.tictactec.ta.lib.Core;
import com.tictactec.ta.lib.MInteger;

import java.util.Arrays;

public class Example {
    private static final int LOOKBACK = 3;
    private static final double EXTRA_VALUE = 10;
    private static Core core = new Core();
    private static MInteger outBeginIndex = new MInteger();
    private static MInteger outLength = new MInteger();
    private static double[] values;
    private static double[] result;

    public static void main(String[] args) {
        values = new double[] { 1, 2, 3, 4, 5 };
        result = new double[values.length];
        core.ema(0, values.length - 1, values, 3, outBeginIndex, outLength, result);
        System.out.println(Arrays.toString(result));
        values = new double[] { EXTRA_VALUE, 1, 2, 3, 4, 5 };
        result = new double[values.length];
        core.ema(0, values.length - 1, values, 3, outBeginIndex, outLength, result);
        System.out.println(Arrays.toString(result));
    }
}

This prints:

[2.0, 3.0, 4.0, 0.0, 0.0]
[4.333333333333333, 3.6666666666666665, 3.833333333333333, 4.416666666666666, 0.0, 0.0]

In the first result we take the 3-period EMA over 5, 4, 3, 2, 1 and get [2.0, 3.0, 4.0, 0.0, 0.0], or rather [0.0, 0.0, 4.0, 3.0, 2.0 ] , reversed, because TA-Lib iterates from the end to the beginning. The first two values are zero because we cannot make a 3-period average using only 2 values. Then the 4 is the result of something to the effect of (5+4+3) / 3 etc. So far so good.

Now we take the EMA again, but this time we prepend a 10 value to the list. We get: [4.333333333333333, 3.6666666666666665, 3.833333333333333, 4.416666666666666, 0.0, 0.0] Now the first two values are still zero, but then all values suddenly change. Since the first entry in the list represents the newest entry, only the very last, new value should have changed.

Now if you think that TA-Lib may iterate the other way, simply replace ema with sma. If we do this we get the following result:

[2.0, 3.0, 4.0, 0.0, 0.0]
[4.333333333333333, 2.0, 3.0, 4.0, 0.0, 0.0]

Only one new value 4.333333333333333 and the rest stays the same, as we expected. This is not the behaviour we see for ema. Either the EMA is computed wrong, or the SMA, but no matter how much you twist it, the library can either not compute a simple SMA or EMA. And all the other indicators, such as Bollinger Bands, MACD, and others that depend on these averages, are also wrong. So are the stochastic indicators, and the list goes on and on and on.

This is quite significant because there is a ton of Java-based projects that use this library here. It's the first go-to library for a lot of projects and trading interfaces. I'm surprised nobody else noticed or made an issue.

sebbie commented 5 years ago

I can't replicate this issue in python, is it possible your java wrapper malforms data?

import talib
import numpy

input = numpy.array([5.0, 4.0, 3.0, 2.0, 1.0])
print(talib.EMA(input, timeperiod = 3)) # [nan nan  4.  3.  2.]

input = numpy.array([5.0, 4.0, 3.0, 2.0, 1.0, 10.0])
print(talib.EMA(input, timeperiod = 3)) # [nan nan  4.  3.  2.  6.]

input = numpy.array([5.0, 4.0, 3.0, 2.0, 1.0, 10.0, 5.0])
print(talib.EMA(input, timeperiod = 3)) # [nan nan 4.  3.  2.  6.  5.5]
yul commented 4 years ago

@sebbie In Python it does not reverse data, I guess this is a source of confusion. But if you add something in the beginning, end results will change, even though the value is beyond provided time period. I have no idea what timeperiod actually does, because it does not prevent calculating data on values before that period. This is basically explained here: https://ta-lib.org/d_api/ta_setunstableperiod.html I have a problem here, because I need to calculate values on streaming data, I first calculate this on existing data and when new data arrives, calculate it on last timeperiod values. And I get a step, because values change too much:

Screenshot 2020-07-24 at 13 29 26

(values up to the last candle were calculated on the whole data, and for the last candle only for the timeperiod) I am using more data, 5 * timeperiod, to prevent this problem, but in this case I may get different results from other software, that actually uses period to prevent old data affecting calculations, like tradingview.com, therefore it is difficult to check my results with them. I don't consider this a problem of this lib though. But since EMA is recursive, I suppose it would be best if there was a way to provide initial value, that would be the last previously calculated value, that could be used as a base for calculating following values. In this case, for example if I calculate EMA(30), I could get stored value for -31st price and use this to calculate EMA on the last 30 price records.

shevkoplyas commented 4 years ago

Ignatius,

Your "short warning to the community" is clearly wrong and should be deleted.

Ignatius: - Because TA-Lib iterates from the end to the beginning. Dmitry: - Nope. TA-Lib does not reverse your data nor does it reverse it's results. You were confused by two zeros at the end of result array thus your put it all upside down. Just read the sources "com/tictactec/ta/lib/Core.java" - it fits half of the screen and would give you an idea of what is happening under the hood.

Ignatious: - Since the first entry in the list represents the newest entry. Dmitry: - Wrong assumption, thus wrong results. The 1st entry in you "values" array is the oldest value thus it is definitely shifting all the results. Try to do 2 things: 1) change your EXTRA_VALUE from 10 up to 10000 and run again. 2) now try to append your EXTRA_VALUE not into the start of your "values" array, but to the end, run again.

Also pay attention to other return values from ema() call. For example: do you know why it gives you back "outBeginIndex" along with "outLength"?? Are those there just for fun?

I didn't go that far as to test TA-Lib with AI and never build "automatised tests using reflection" around it, but I bet ema() works just fine! Let's take some test series data and try! Here's your slightly modified example:

import java.io.*;

import com.tictactec.ta.lib.Core;
import com.tictactec.ta.lib.MInteger;

import java.util.Arrays;

public class Example {
    private static final int LOOKBACK = 10;
    private static final double EXTRA_VALUE = 100000;
    private static Core core = new Core();
    private static MInteger outBeginIndex = new MInteger();
    private static MInteger outLength = new MInteger();
    private static double[] values;
    private static double[] expected_ema_result;
    private static double[] expected_sma_result;
    private static double[] result_ema;
    private static double[] result_sma;

/**
 * Round half away from zero ('commercial' rounding)
 * Uses correction to offset floating-point inaccuracies.
 * Works symmetrically for positive and negative numbers.
 * src: https://stackoverflow.com/a/48764733/7022062
 */
public static double round(double num, int digits) {

    // epsilon correction
    double n = Double.longBitsToDouble(Double.doubleToLongBits(num) + 1);
    double p = Math.pow(10, digits);
    return Math.round(n * p) / p;
}

    public static void main(String[] args) {

    // test series for monvig average source:
    //     article: https://school.stockcharts.com/doku.php?id=technical_indicators:moving_averages
    // spreadsheet: https://school.stockcharts.com/lib/exe/fetch.php?media=technical_indicators:moving_averages:cs-movavg.xls
        values = new double[] {22.27, 22.19, 22.08, 22.17, 22.18, 22.13, 22.23, 22.43, 22.24, 22.29, 22.15, 22.39, 22.38, 22.61, 23.36, 24.05, 23.75, 23.83, 23.95, 23.63, 23.82, 23.87, 23.65, 23.19, 23.10, 23.33, 22.68, 23.10, 22.40, 22.17};
    expected_ema_result = new double[] {22.22, 22.21, 22.24, 22.27, 22.33, 22.52, 22.80, 22.97, 23.13, 23.28, 23.34, 23.43, 23.51, 23.54, 23.47, 23.40, 23.39, 23.26, 23.23, 23.08, 22.92};
    expected_sma_result = new double[] {22.22, 22.21, 22.23, 22.26, 22.3, 22.42, 22.61, 22.77, 22.91, 23.08, 23.21, 23.38, 23.53, 23.65, 23.71, 23.69, 23.61, 23.51, 23.43, 23.28, 23.13};

        result_ema = new double[values.length];
        result_sma = new double[values.length];

        core.ema(0, values.length - 1, values, LOOKBACK, outBeginIndex, outLength, result_ema);
        core.sma(0, values.length - 1, values, LOOKBACK, outBeginIndex, outLength, result_sma);

        // System.out.println(Arrays.toString(values));

        for (int i = 0; i < result_ema.length; i++) {
            result_ema[i] = round(result_ema[i], 2);
            result_sma[i] = round(result_sma[i], 2);
        }

        System.out.println("\n---- result EMA and expected EMA ----");
        System.out.println(Arrays.toString(result_ema));
        System.out.println(Arrays.toString(expected_ema_result));

        System.out.println("\n---- result SMA and expected SMA ----");
        System.out.println(Arrays.toString(result_sma));
        System.out.println(Arrays.toString(expected_sma_result));

    }
}

And it's output:

---- result EMA and expected EMA ----
[22.22, 22.21, 22.24, 22.27, 22.33, 22.52, 22.8, 22.97, 23.13, 23.28, 23.34, 23.43, 23.51, 23.53, 23.47, 23.4, 23.39, 23.26, 23.23, 23.08, 22.92, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[22.22, 22.21, 22.24, 22.27, 22.33, 22.52, 22.8, 22.97, 23.13, 23.28, 23.34, 23.43, 23.51, 23.54, 23.47, 23.4, 23.39, 23.26, 23.23, 23.08, 22.92]

---- result SMA and expected SMA ----
[22.22, 22.21, 22.23, 22.26, 22.3, 22.42, 22.61, 22.77, 22.91, 23.08, 23.21, 23.38, 23.53, 23.65, 23.71, 23.68, 23.61, 23.51, 23.43, 23.28, 23.13, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[22.22, 22.21, 22.23, 22.26, 22.3, 22.42, 22.61, 22.77, 22.91, 23.08, 23.21, 23.38, 23.53, 23.65, 23.71, 23.69, 23.61, 23.51, 23.43, 23.28, 23.13]
ig-dev commented 4 years ago

Thank you Dmitry for the example. It seems indeed that ta-lib produces the correct results if you use it right, so I will close this issue.

I note however that I was not the only one who was misled by this libraries behaviour output. The source of the confusion becomes clear when comparing the Python output to the Java output. The EMA(3) for 5.0, 4.0, 3.0, 2.0, 1.0 in python is

[nan nan  4.  3.  2.]

But in Java it is

[4.0, 3.0, 2.0, 0.0, 0.0]

The natural assumption is that the zeros at the end of the array represent the undefined values of the EMA (the nan of the Python output). Which in turn can lead to the wrong conclusion that the result is mean to be read back-to-front, or that the input is read back-to-front.

And yep, in retrospect it seems fairly stupid of me that I didn't discover this before opening this issue. Sometimes you go down some rabbit-hole during debugging, and the right thought just does not occur to you anymore...