Change backtesting strategy

noudcorten commented 3 years ago

Our current implementation for backtesting takes a lot of time to process all the ticks. This time issue is the result of our backtesting engine being designed as if it is 'live trading'. I think we can change this strategy to actual backtesting since all data is already available at the beginning of the backtest. To change the current implementation to an actual backtest, the engine needs to populate indicators, buy and sell signals only once (using all data).

To do:

Change backtesting strategy from 'live trading' to actual backtesting
- Populate the indicators only once
- Populate buy signals only once
- Populate sell signals only once (incl. ROI and stoploss)
Evaluate and improve that way that the 'Trade' class is currently used

It's hard for me to estimate how much time this is going to take to implement, but what do you guys think of the idea?

pvdklei commented 3 years ago

I agree. Currently, if you use talib in the strategy, indicators are calculated for every tick every time you call talib.SMA(dataframe) or some other indicator function. But as populate_indicators() is called for every tick during backtesting, you do a lot of double work (I did check this by timing it), because you only need the indicators for the new tick, but recalculate it for all of them. In the current implementation, this rounds up to O(t^2), where t is the total amount of ticks on which we backtest. Just doing it once would be O(t).

A problem is that you could (by not using talib) also implement a strategy for which the indicators are not calculated for every past tick, on every populate_indicators(). You could cache the result, and only calculate indicators for the new tick, to be more efficient in when this strategy is doing real time trading. In that case you cannot call populate_indicators() once, as it will only populate the indicators for the last tick.

We would have to make separate methods the Strategy base class, one for only calculating the indicators for all data, and one for only the last tick.

keebrev commented 3 years ago

@noudcorten I think you have a good point. However, populating everything at once brings some risks we need to cover in that case. For example; SMA21 should only be calculated with PAST ticks, and not ticks ahead. When using volume averages, this should also be done with data from the 'past'.

If you have any ideas how to cover these risks, I think implementing it like this will save a lot of time.

Last, what would you change to the Trade class? If we are going to implement it this way, I imagine this class could become unnecessary but it still makes data easily accessible.

@pvdklei I do not see any benefit in making those changes. When we run all data at once through populate, sell, buy methods, all ticks will be filled with indicators and buy-/sellsignals. This could be done by only 1 method call, therefor 2 methods wouldn't make sense.

Edit: Have a look at #73 when implementing this

pvdklei commented 3 years ago

@keebrev I proposed a problem, not changes. The problem is that since end-users might want to use their strategy in real trading (where you do not have all data beforehand), and implement it like that. In that case calculating every indicator or buy-signal for all available ticks is counterintuitive and very inefficient, since you only really need to process the new tick. Not all earlier ones as well.

marijn111 commented 3 years ago

Saw this video the other day: https://www.youtube.com/watch?v=DnKxKFXB4NQ It showcases the possibilities of using the builtin python decorator @cache to greatly increase speed when calling some function multiple times.

When we decide to change the backtesting strategy to fill all indicators and data 'instantly', this might come to use in some way or another. Not sure how though, but thought I'd just mention it!

pvdklei commented 3 years ago

@marijn111 the @cache decorator uses a dict to store the return value (as value) given the parameters (as key). So every time the function gets called, it looks up if it has already calculated the return value in the cache/dict. Most of our functions take in complex objects that are not copied, I cant find one that does not, (so caching is not possible and gives untraceable errors). Would only recommend for some functions that take in a string or number, and return a string or number.

dema-trading-ai / engine

Change backtesting strategy #104