MiniXC / simple-back

A simple daily python backtester that works out of the box.
Mozilla Public License 2.0
59 stars 12 forks source link

How to write simple ML backtesting without writing any oop code? #29

Closed fightthepower closed 4 years ago

fightthepower commented 4 years ago

I have a ML model which takes three inputs and outputs a single value , similar to this

              #   ML training
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
import pandas as pd

# Assume the features as three lagged inputs
X,y = make_regression(n_samples=100, n_features=3,noise=1)

reg = LinearRegression().fit(X, y)
reg.score(X, y)

     #   Back testing with trained model
from simple_back.backtester import BacktesterBuilder

builder = (
   BacktesterBuilder()
   .name('JNUG 20-Day Crossover')
   .balance(10_000)
   .calendar('NYSE')
   .compare(['JNUG']) # strategies to compare with
   .live_progress() # show a progress bar
)

bt = builder.no_live_plot().build()
for day, event, b in bt['2019-1-1':'2020-1-1']:
    if event == 'open':
        jnug_ma = b.prices['JNUG',-20:]['close'].mean()
        d = {'data1': b.prices['JNUG']['close'], 'data2': b.prices['JNUG']['open'],'data3': b.prices['JNUG']['high']}
        df = pd.DataFrame(d).dropna()
        pred = reg.predict(df)
        if pred > 0 and b.price('JNUG') > jnug_ma:
            # Long
        if pred < 0 and b.price('JNUG') < jnug_ma
            # Short

Here instead of iterating through b.prices['JNUG']['close'] alone I want to also use my newly created data df and its values for prediction and trading.

How can I do this in simple-back preferably without writing any oop code?

MiniXC commented 4 years ago

I think I don't completely understand what you are asking. The code you show should do you what you want, shouldn't it? What is not working at the moment?

fightthepower commented 4 years ago

The above code is just a pseudo code to get my idea through.

When I run the code I am getting this error

  0%|   | 1/504 [00:00<00:18, 26.62it/s]
Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The problem is pred is an array with 1320 elements and that is the cause of above error. I want to iterate my prediction value like b.price('JNUG') > jnug_ma simultaneously on the given day.

Strategy:

MiniXC commented 4 years ago

So I assume you want to do regression on multiple symbols prices (hence 1320 elements). A way to do this, if your training logic is already in place, would be to iterate through your predictions and allocate 1/number_of_predictions of capital to it. Assuming you zip your symbols and predictions:

for symbol, pred in preds:
  if pred < 0:
    b.short(symbol, percent=1/len(preds)
  if pred > 0:
    b.long(symbol, percent=1/len(preds)

Does this help?

fightthepower commented 4 years ago

Hey thank you for replying. It is not symbols and I was confused of how data is parsed in simple-back , this helped debug the problem

b.prices['JNUG',-2:]['close'].shape
b.prices['JNUG']['close'].shape

and rewrote those lies

        d = {'close': b.prices['JNUG',-1:]['close'], 'open': b.prices['JNUG',-1:]['open'],'high': b.prices['JNUG',-1:]['high']}
        df = pd.DataFrame(d).dropna()
        if df.empty:
            continue
        pred = reg.predict(df)

Thank you for taking your time :+1: