DaveSkender / Stock.Indicators

Stock Indicators for .NET is a C# NuGet package that transforms raw equity, commodity, forex, or cryptocurrency financial market price quotes into technical indicators and trading insights. You'll need this essential data in the investment tools that you're building for algorithmic trading, technical analysis, machine learning, or visual charting.
https://dotnet.StockIndicators.dev
Apache License 2.0
976 stars 246 forks source link

MACD - slow calculation #111

Closed ikorman closed 4 years ago

ikorman commented 4 years ago

Hi

I'm playing and testing this great library and so far I'm very pleased with it. I did noticed one thing dough - MACD calculation is rather slow. Here is an example:

Are those expected times?

Ivan

DaveSkender commented 4 years ago

That does not seem right. Since MACD uses 4 full series EMA calculations, I'd expect it to be approximately 4× slower than EMA. If this is one contiguous history series of Quotes, be sure that you are not running it on every date in that series; you'd only need to run it one (see #88).

ikorman commented 4 years ago

I load historical data and then calculate various indicators for further processing, here is sample code I'm using for it:

        // RSI
        sw = Stopwatch.StartNew();
        var rsi = Indicator.GetRsi(quotes, 14);
        sw.Stop();
        Console.WriteLine($"RSI: {sw.ElapsedMilliseconds}");
        this.Add("RSI", rsi.Cast<ResultBase>().ToList());

        // MACD
        sw = Stopwatch.StartNew();
        var macd = Indicator.GetMacd(quotes, 12, 26, 9);
        sw.Stop();
        Console.WriteLine($"MACD: {sw.ElapsedMilliseconds}");
        var macdRes = macd.Cast<ResultBase>().ToList();
        this.Add("MACD", macdRes.Cast<ResultBase>().ToList());
        var lastmacd = macdRes.Last();

        // Bolinger bands
        sw = Stopwatch.StartNew();
        var bb = Indicator.GetBollingerBands(quotes);
        sw.Stop();
        Console.WriteLine($"BB: {sw.ElapsedMilliseconds}");
        var bbRes = bb.Cast<ResultBase>().ToList();
        this.Add("BB", bbRes);
        var lastBBRes = bbRes.Last();

        // Ichimoku
        sw = Stopwatch.StartNew();
        var ichimoku = Indicator.GetIchimoku(quotes);
        sw.Stop();
        Console.WriteLine($"Ichimoku: {sw.ElapsedMilliseconds}");
        var ichimokuRes = ichimoku.Cast<ResultBase>().ToList();
        this.Add("Ichimoku", ichimokuRes);
        var lastIchimokuRes = ichimokuRes.Last();

And here are the times from it (in ms). Source is always the same and it is processed only once.

Heikin ashi: 55 RSI: 61 MACD: 10910 BB: 12435 Ichimoku: 44909

Thx Ivan

DaveSkender commented 4 years ago

Ichimoku has the most complex calculations, so I do expect that to be the slowest; however, 44 seconds still seems very slow. I'd expect it to be not more than 4× slower than RSI. I'll be doing some performance benchmarking soon (AB#727), so I'll be able to get a good read on what to expect for each indicator.

For now, you can get a sense of what to expect from the unit test results in the build. Unit tests are only using a history with 500 quotes, so you'll have to extrapolate a bit for larger data sets.

DaveSkender commented 4 years ago

I'll also take another look at performance tuning on some of these more complex indicators. I may explore some parallelism or other async methods since some of these internal computations do not need to be serialized. If you have any thoughts or want to help, let me know.

ikorman commented 4 years ago

I have take a look on MACD code and most time is burned where you search for slow & fast EMA with Where(...).FirstOrDefault(). I believe that better approach would be to go from fact that array of history and ema's are the same length with same index base, hence the code could look something like this:

    public static IEnumerable<MacdResult> GetMacd(IEnumerable<Quote> history, int fastPeriod = 12, int slowPeriod = 26, int signalPeriod = 9)
    {

        // clean quotes
        Cleaners.PrepareHistory(history);

        // check parameters
        ValidateMacd(history, fastPeriod, slowPeriod, signalPeriod);

        // initialize
        var emaFast = GetEma(history, fastPeriod).ToList();
        var emaSlow = GetEma(history, slowPeriod).ToList();

        List<BasicData> emaDiff = new List<BasicData>();
        List<MacdResult> results = new List<MacdResult>();

        var a = history.ToList();
        for (int c = 0; c < a.Count; c++)
        //foreach (Quote h in history)
        {
            var h = a[c];
            //EmaResult df = emaFast.Where(x => x.Date == h.Date).FirstOrDefault();
            //EmaResult ds = emaSlow.Where(x => x.Date == h.Date).FirstOrDefault();
            EmaResult df = emaFast[c];
            EmaResult ds = emaSlow[c];

... ...

In this way you could avoid costly search and instead you can do direct lookup via index. Does it make sense?

Ivan

DaveSkender commented 4 years ago

As an old database guy, it always scares me to use an ordinal position in a collection over a unique key value. I suspect, under the hood, LINQ may be doing something like that anyway, so it might not help much. With that said, it could potentially be faster and if that can be done reliably, it might be worth exploring.

If anything, I should probably key off of Index instead of Date as I believe integers resolve faster.

I think I'll implement a performance benchmarking test rig first so I can do a more reliable comparative analysis of changes. Feel free to tinker with this in your own Fork until I get a chance to look at it closer.

ikorman commented 4 years ago

If you are sticking to List which guarantee item order and keeping same length of data array (e.g. EMA of History[10] will always yield EMA[10] results), it should be safe and fast. I did tried to reworked GetMacd in following way (at least first part of calculations)

    public static IEnumerable<MacdResult> GetMacdFast(IEnumerable<Quote> history, int fastPeriod = 12, int slowPeriod = 26, int signalPeriod = 9)
    {

        // clean quotes
        Cleaners.PrepareHistory(history);

        // check parameters
        ValidateMacd(history, fastPeriod, slowPeriod, signalPeriod);

        // initialize
        List<EmaResult> emaFast = GetEma(history, fastPeriod).ToList();
        List<EmaResult> emaSlow = GetEma(history, slowPeriod).ToList();

        List<BasicData> emaDiff = new List<BasicData>();
        List<MacdResult> results = new List<MacdResult>();

        var historyList = history.ToList();

        for (int idx = 0; idx < historyList.Count; idx++)
        {
            var h = historyList[idx];
            EmaResult df = emaFast[idx];
            EmaResult ds = emaSlow[idx];

            MacdResult result = new MacdResult
            {
                Index = (int)h.Index,
                Date = h.Date
            };

            if (df?.Ema != null && ds?.Ema != null)
            {

                decimal macd = (decimal)df.Ema - (decimal)ds.Ema;
                result.Macd = macd;

                // temp data for interim EMA of macd
                BasicData diff = new BasicData
                {
                    Date = h.Date,
                    Index = h.Index - slowPeriod,
                    Value = macd
                };

                emaDiff.Add(diff);
            }

            results.Add(result);
        }

        IEnumerable<EmaResult> emaSignal = CalcEma(emaDiff, signalPeriod);

        // add signal, histogram to result
        foreach (MacdResult r in results)
        {
            EmaResult ds = emaSignal.Where(x => x.Date == r.Date).FirstOrDefault();

            if (ds?.Ema == null)
            {
                continue;
            }

            r.Signal = ds.Ema;
            r.Histogram = r.Macd - r.Signal;
        }

        return results;
    }

Results on sample set were:

MACD: 23,572 sec MACD fast: 7,892 sec

Did check and compare outputs and could not find difference in results. I tried to solve Where() search at the end but as it has different length of data, it was bit too much for me at this stage. Paralleling things with parallel and similar would help but only to certain extent. For example, I was playing with MACD calculation from some other library (https://github.com/anilca/NetTrader.Indicator/blob/master/NetTrader.Indicator/MACD.cs) and it was running under second (there were some other issues that made me look forward, like data accuracy in general)

I like the approach of pine scripting on TradingView where all data is set of arrays with same length, which makes calculations quite easy to follow.

BR Ivan

DaveSkender commented 4 years ago

You're right, an all array approach is definitely faster, but it's a tradeoff with usability overall. I used to use the array-driven ta-lib for a long time, but had to do a lot of extra coding to manipulate the data to make it usable. TA-Lib has not been updated for 10+ years and was holding me back from moving to newer tech, so I had to move on. There were not many pure modern .NET technical indicator packages, so I started building this one to be a little friendlier for trading systems and charting use cases.

You've got me thinking about arrays now, so I'll definitely be looking to test incorporating that in some form internally, but will still try to keep the public utility the same. Thanks for helping!

ikorman commented 4 years ago

No problem. Btw I managed to resolve last Where() in GetMacd and now whole thing executes in only 100ms. From class/function calling point of view nothing changed (I just called it GetMacdFast for sake of testing), so those type of changes should not influence their usability if somebody already use them in their code, it will be just quicker

    public static IEnumerable<MacdResult> GetMacdFast(IEnumerable<Quote> history, int fastPeriod = 12, int slowPeriod = 26, int signalPeriod = 9)
    {

        // clean quotes
        Cleaners.PrepareHistory(history);

        // check parameters
        ValidateMacd(history, fastPeriod, slowPeriod, signalPeriod);

        // initialize
        List<EmaResult> emaFast = GetEma(history, fastPeriod).ToList();
        List<EmaResult> emaSlow = GetEma(history, slowPeriod).ToList();

        List<BasicData> emaDiff = new List<BasicData>();
        List<MacdResult> results = new List<MacdResult>();

        var historyList = history.ToList();

        int skip = 0;  // used to count how many null entries we will have to insert into emaDiff array, to make it the same length
        for (int idx = 0; idx < historyList.Count; idx++)
        {
            var h = historyList[idx];
            EmaResult df = emaFast[idx];
            EmaResult ds = emaSlow[idx];

            MacdResult result = new MacdResult
            {
                Index = (int)h.Index,
                Date = h.Date
            };

            if (df?.Ema != null && ds?.Ema != null)
            {

                decimal macd = (decimal)df.Ema - (decimal)ds.Ema;
                result.Macd = macd;

                // temp data for interim EMA of macd
                BasicData diff = new BasicData
                {
                    Date = h.Date,
                    Index = h.Index - slowPeriod,
                    Value = macd
                };

                emaDiff.Add(diff);
            }
            else
                skip++;

            results.Add(result);
        }

        IEnumerable<EmaResult> emaSignal = CalcEma(emaDiff, signalPeriod);
        var emaSignalList = emaSignal.ToList();

        // make emaSignal array the same lenght as macdResults, padding array beginning with nulls
        for (int c = 0; c < skip; c++)
            emaSignalList.Insert(0, null);

        // add signal, histogram to result
        var macdResultList = results.ToList();
        for (int c = 0; c < macdResultList.Count; c++)
        {
            var r = results[c];
            EmaResult ds = emaSignalList[c];

            if (ds?.Ema == null)
            {
                continue;
            }

            r.Signal = ds.Ema;
            r.Histogram = r.Macd - r.Signal;
        }

        return results;
    }

BR Ivan

ikorman commented 4 years ago

Forget to mention, how I landed here and why the speed is important for me: I am evaluating possibility to develop automated bot for scalping on cryptos and at the moment I'm running different strategies through historical data of 3 months (1m, 3m and 5m interval). Therefore each time I change something in the strategy, I have to reload the file and calculate all the indicators as preparation for simulation on historical data - and that preparation took couple of minutes due to above topic, so I took some time to look into it :)

Like your work on the library, amount of indicators and how it's easy to implement it :)

Btw. how do you confirm correctness of the data, what is reference for the comparison?

DaveSkender commented 4 years ago

Very nice! I'll reference this when making changes and will likely take this approach to speed things up.

I see a lot of interesting use cases from users that are doing a lot of different things, ranging from massive bulk operations, machine learning, Forex, Crypto, etc. Personally, I'm mostly using it for a more traditional trading strategy system and for charting. It's been really cool to see what everyone is doing ... from all over the world.

Regarding correctness testing, there are a few things I'm doing:

  1. Try to achieve high % code coverage with unit tests
  2. Manually calculate the indicators using a spreadsheet with static historical quotes, then typically take a TDD approach (example: see spreadsheet in PR #100)
  3. Compare output to reputable commercial charting sites, like StockCharts.com
DaveSkender commented 4 years ago

Out of curiosity, what framework are you targeting? So far I'm not seeing the same slowness for MACD on the benchmarks. I know .NET Core tends to work better than the other frameworks that we include in the package; was wondering if you're using an older framework.

ikorman commented 4 years ago

Good point. As my app is GUI based, I'm targeting .net framework 4.7.2. At first I have used your precompiled library from NuGet (there I had 10 s) and then I used source code that I compiled together with my app (there it was 23 sec), so there is some difference depending on the target.

Regarding you benchmarks, I'm not sure if simple scaling will work for MACD as it's number of calculations rises exponentially with history size (e.g. for hist size 10 it has to loop 10 times through 10 EMA list, while for 100 it has to work will 100 EMA list size - 100 vs 10,000 in worst theoretical case).

DaveSkender commented 4 years ago

this change was implemented in v0.10.10

github-actions[bot] commented 3 years ago

This Issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new Issue for related bugs.