Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
982 stars 259 forks source link

Other Timeframes #8

Closed AdrianP- closed 6 years ago

AdrianP- commented 6 years ago

I know a current limitation is accept Forex 1 min (only Forex?), but my datasets are with bigger timeframes.

This is the stacktrace when timeframe is changed:

Process BTgymServer-2:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()

  File "/home/adrian/btgym/btgym/server.py", line 405, in run
    episode = cerebro.run(stdstats=True, preload=False, oldbuysell=True)[0]

  File "/usr/local/lib/python3.5/dist-packages/backtrader/cerebro.py", line 1142, in run
    runstrat = self.runstrategies(iterstrat)

  File "/usr/local/lib/python3.5/dist-packages/backtrader/cerebro.py", line 1327, in runstrategies
    self.stop_writers(runstrats)

  File "/usr/local/lib/python3.5/dist-packages/backtrader/cerebro.py", line 1352, in stop_writers
    datainfos['Data%d' % i] = data.getwriterinfo()

  File "/usr/local/lib/python3.5/dist-packages/backtrader/dataseries.py", line 101, in getwriterinfo
    info['Timeframe'] = TimeFrame.TName(self._timeframe)

  File "/usr/local/lib/python3.5/dist-packages/backtrader/dataseries.py", line 57, in TName
    return cls.Names[tframe]
IndexError: list index out of range

Any idea?

Kismuz commented 6 years ago
  1. ....seems it happens when backtrader tries to run the strategy. The only way to find what is wrong is first try to manually run dummy strategy in backtrader "traditional way" over your data to ensure data parsing works correct and only after that plug it in btgym. What data you are feeding in?

  2. There is actually no matter Forex or not, matter is data input format. Built-in data parser was made to accept data from particular source. You can find parsing configuration in https://github.com/Kismuz/btgym/blob/master/btgym/datafeed.py under CSV to Pandas Params section

AdrianP- commented 6 years ago

This happen also with DAT_ASCII_EURUSD_M1_2016.csv.

  1. In fact I add new function to use all data:

    def sequential_dataset(self):
    
        episode = BTgymDataset(**self.params)
        episode.filename = self.filename
        self.log.debug('Episode filename: <{}>.'.format(episode.filename))
        episode.data = self.data
        return episode

    However, there is a misconception. The value in algo-trading isn't in OHLC data, is in the indicators that you can calculate with this data. In a first view the code is not prepared for the indicators. Have you worked on that?

Kismuz commented 6 years ago

You can use any indicators or any value calculated over raw OHLC data by subclassing base BTgymStrategy and and defining your own set_datalines(), get_state() and get_reward() methods,

  1. Define all indicators ( as backtrader indicators) in set_datalines(), which is invoked once before episode run;
  2. Define any function of those indicators in get_state() which is invoked upon every env.step() an use output as state observation.
  3. Same for reward shaping. The main idea was to incorporate backtrader workflow with all it's host of indicators and any custom functions to define desired state observation presentation. Refer to backtrader docs how to set custom datalines feeds such as indicators. Defaults are ( from strategy.py):

    def get_state(self):
        """
        One can override this method,
        defining necessary calculations and return arbitrary shaped tensor.
        It's possible either to compute entire featurized environment state
        or just pass raw price data to RL algorithm featurizer module.
        Note1: 'data' referes to bt.startegy datafeeds and should be treated as such.
        Datafeed Lines that are not default to BTgymStrategy should be explicitly defined in
        define_datalines().
        NOTE: while iterating, ._get_raw_state() method is called just before this one,
        so variable `self.raw_state` is fresh and ready to use.
        """
        return self.raw_state
    
    def get_reward(self):
        """
        Default reward estimator.
        Computes reward as log utility of current to initial portfolio value ratio.
        Returns scalar <reward, type=float>.
        Same principles as for state composer apply. Can return raw portfolio
        performance statistics or enclose entire reward estimation algorithm.
        """
        return float(np.log(self.stats.broker.value[0] / self.env.broker.startingcash))

Does that answers?

kfeeeeee commented 6 years ago

You can use any indicators or any value calculated over raw OHLC data by subclassing base BTgymStrategy and and defining your own set_datalines(), get_state() and get_reward() methods,

Define all indicators ( as backtrader indicators) in set_datalines(), which is invoked once before episode run; Define any function of those indicators in get_state() which is invoked upon every env.step() an use output as state observation.

Could you please explain this in more details? I am not sure how I should actually set those datalines. In my case, the state consists of OHLC plus some calculated indicators that are stored in a csv file. What would be the way to set those data as a dataline and actually feed it (additionally to the raw price state) to the RL algorithm?

Any help is appreciated! Thanks.

Kismuz commented 6 years ago

@kfeeeeee, There are two parts:

  1. the comment you have cited is for the case when you only have OHLC data stored in csv-file and want to compute some extra statistics or features as observation state. In this case(example):

    
    class MyStrategy(BTgymStrategy):
    """
    Example subclass of BT server inner computation strategy.
    """
    
    def __init__(self, **kwargs):
        super(MyStrategy, self).__init__(**kwargs)
    
       # Use backtrader functions to add four mov.averages (using Open values :
        self.data.sma_4 = btind.SimpleMovingAverage(self.datas[0], period=4)
        self.data.sma_8 = btind.SimpleMovingAverage(self.datas[0], period=8)
        self.data.sma_16 = btind.SimpleMovingAverage(self.datas[0], period=16)
        self.data.sma_32 = btind.SimpleMovingAverage(self.datas[0], period=32)
    
        # Time-embedding dimension shortcut:
       self.dim_0 = self.p.state_shape['raw_state'].shape[0]
    
       # Service sma to correctly get initial embedded values: 
       self.data.dim_sma = btind.SimpleMovingAverage(
            self.datas[0],
            period=(32 + self.dim_0)
        )
        self.data.dim_sma.plotinfo.plot = False
    
    def get_state(self):
        """
        Overrides default method to compute state observation as  dictionary of two [dim_0, 4] arrays 
        of OHLC prices and mov. averages.
       """              
        x = np.stack(
            [
                np.frombuffer(self.data.sma_4.get(size=self.dim_0)), 
                np.frombuffer(self.data.sma_8.get(size=self.dim_0)), 
                np.frombuffer(self.data.sma_16.get(size=self.dim_0)),
                 np.frombuffer(self.data.sma_32.get(size=self.dim_0)),
            ], 
            axis=-1
        )
    
        self.state['raw_state'] = self.raw_state
    
        self.state['model_input'] =x
    
        return self.state      

Later, when instantiating environment (example):

time_embed_dim = 16 state_shape = { 'raw_state': spaces.Box(low=-100, high=100, shape=(time_embed_dim, 4)), 'model_input': spaces.Box(low=-100, high=100, shape=(time_embed_dim, 4)), }

MyCerebro = bt.Cerebro()

MyCerebro.addstrategy( MyStrategy, state_shape=state_shape, portfolio_actions=('hold', 'buy', 'sell', ), drawdown_call=5, # max to loose, in percent of initial cash target_call=8, # max to win, same skip_frame=10, )

etc...

Kismuz commented 6 years ago
  1. When data file already holds additional stats as separate columns, one need to set correct parsing configuration of given data found in datafeed.py. After that all additional data will be available inside strategy as standard backtrader datalines (self.datas by default) and can be retrieved as described above.
Kismuz commented 6 years ago

Part 2 example: Suppose file contains one min. OHLC bars, Volume and two custom indicators. Than:

params = dict(
        # CSV to Pandas params.
        sep=';',
        header=0,
        index_col=0,
        parse_dates=True,
        names=['open', 'high', 'low', 'close', 'volume', 'my_indicator_1', 'my_indicator_2'],

        # Pandas to BT.feeds params:
        timeframe=1,  # 1 minute.
        datetime=0,
        open=1,
        high=2,
        low=3,
        close=4,
        volume=5,
        my_indicator_1=6,
        my_indicator_2=7,

        # Random-sampling params:
        start_weekdays=[0, 1, 2, 3, ],  # Only weekdays from the list will be used for episode start.
        start_00=True,  # Episode start time will be set to first record of the day (usually 00:00).
        episode_len_days=1,  # Maximum episode time duration in days, hours, minutes:
        episode_len_hours=23,
        episode_len_minutes=55,
        time_gap_days=0,  # Maximum data time gap allowed within sample in days, hours. Thereby,
        time_gap_hours=5,  # if set to be < 1 day, samples containing weekends and holidays gaps will 
        be rejected.
    )

MyDataset = BTgymDataset(
    filename='<your_filename.scv>', 
    **params,
)
# Check:

MyDataset.read_csv()
MyDataset.describe()

# Pass it to environment ...
kfeeeeee commented 6 years ago

Perfect, that is exactly what I needed. Keep up the good work!

kfeeeeee commented 6 years ago

Sorry for asking again. You wrote

After that all additional data will be available inside strategy as standard backtrader datalines (self.datas by default) and can be retrieved as described above.

However, when I try your code above, my_indicator_1 and my_indicator_2 are not accessible in strategy at all (self.datas) is a list of length 1 and self.datas[0] contains the OHLC data. Am I missing something?

Kismuz commented 6 years ago

@kfeeeeee , I apologise for giving incorrect answer, it turned out to be tricky. In short: I have no working solution for extended datafeeds yet . In detail: principally, one can extend OHLC data as described here: https://www.backtrader.com/docu/extending-a-datafeed.html

But as far as I understand it is correct for generic CSV datafeed, while btgym uses btfeeds.PandasDirectData internally. After making minor tweaks ( you need to update btgym to get use of it) a'm able to access custom data in code like this:

import backtrader.feeds as btfeeds

class ExtraPandasDirectData(btfeeds.PandasDirectData):
    lines = ('my_id_1', 'my_id_2')  # extra datalines

class ExtraLinesDataset(BTgymDataset):

    def to_btfeed(self):
        """
        Overrides default method to add custom datalines.
        Performs BTgymDataset-->bt.feed conversion.
        Returns bt.datafeed instance.
        """
        try:
            assert not self.data.empty
            btfeed = ExtraPandasDirectData(
                dataname=self.data,
                timeframe=self.timeframe,
                datetime=self.datetime,
                open=self.open,
                high=self.high,
                low=self.low,
                close=self.close,
                volume=self.volume,
                my_id_1=6,  # Same lines
                my_id_2=7,
            )
            btfeed.numrecords = self.data.shape[0]
            return btfeed

        except:
            msg = 'BTgymDataset instance holds no data. Hint: forgot to call .read_csv()?'
            self.log.error(msg)
            raise AssertionError(msg)

params = dict(
    # CSV to Pandas params.
    sep=';',
    header=0,
    index_col=0,
    parse_dates=True,
    names=['open', 'high', 'low', 'close', 'volume', 'my_id_1', 'my_id_2'],

    # Pandas to BT.feeds params:
    timeframe=1,  # 1 minute.
    datetime=0,
    open=1,
    high=2,
    low=3,
    close=4,
    volume=5,
    openinterest=-1,
    my_id_1=6,
    my_id_2=7,

    # Random-sampling params:
    # .....omitted for brevity....
)

MyDataset = ExtraLinesDataset(filename='my_file.csv', **params)

after that lines are accessible inside strategy as self.data.my_id_1, but for reasons I can't yet understand it contains only nan values. Maybe it is bug in my code, maybe it is related to PandasDirectData structure.

kfeeeeee commented 6 years ago

@Kismuz Thanks for your reply. I was able to reproduce the bug with the nan values and I assume it is due to the PandasDirectData. For now (though the docu of backtrader is not clear in my opinion) I think it is only possible to extend the loaded data using the GenericCSVData.

kfeeeeee commented 6 years ago

After reading this thread: https://community.backtrader.com/topic/158/how-to-feed-backtrader-alternative-data/4

I found a working solution like this:

class ExtraPandasDirectData(btfeeds.PandasDirectData):
    lines = ('width',)
    params = (
        ('width',2),
    )

    datafields = btfeeds.PandasData.datafields + (['width'])
Kismuz commented 6 years ago

Aha! Fine and simple.

joaosalvado10 commented 6 years ago

Hi @kfeeeeee (@Kismuz ) I used your suggestion and I was capable of including a new feature already presented in my csv file, however when I am not capable of using them in the model. what needs to be done in order to use the new features in the model? I added a new data channel and I add this new data to the np.stack as weel, finally I changed the shape of external to (time_dim,1,4) which freezes the running.

Thank you

Kismuz commented 6 years ago

@joaosalvado10,

I added a new data channel and I add this new data to the np.stack as weel, finally I changed the shape of external to (time_dim,1,4) which freezes the running.

joaosalvado10 commented 6 years ago

@Kismuz I am running the Unreal example. Actually, there is no error but the train never starts it keeps creating master sessions forever. CheckOut my log file with the verbose=2 on launcher and env.

https://wetransfer.com/downloads/2748b17526fb4f1fc90603df59dafd9f20171221172751/6a1910e00d4a3f5e869700b6176c2e6b20171221172751/e5eb01

Kismuz commented 6 years ago

Does running your environment manually (doing reset() and step() before putting it in AAC framework) goes correct? If yes, the most probable cause is error in TF graph execution by one of the workers. It can come muted while doing distributed work: no terminal output, just freezing. It also same behaviour if error comes when defining graph itself but in this case I see this point passed. The remedy is to include debug check strings all over related files (aac.py/BaseAAC and train.py/env_runner), to see progress going, which I usually do in such cases.

I can only help by replicating error at my workplace with full code in hand.

joaosalvado10 commented 6 years ago

Does running your environment manually (doing reset() and step() before putting it in AAC framework) goes correct?

Yes, I performed the reset and the step manually and it goes well.

If yes, the most probable cause is error in TF graph execution by one of the workers. It can come muted while doing distributed work: no terminal output, just freezing

Maybe that is the cause but i cant see any apparent reason for that to happen. It seems like it is freezing in worker.py on this line of code :

with sv.managed_session(server.target,config=config) as sess,sess.as_Default()

joaosalvado10 commented 6 years ago

Hello, @Kismuz have you managed how to fix this to enable having more than 3 features? Thank you

Kismuz commented 6 years ago

@joaosalvado10 , no since I don't have your code. I have pushed one of my developer branches here: https://github.com/Kismuz/btgym/tree/reserach_dev_strat_4_11 Take a look at code for strategies classes #4_7 ... 4_11 in reserarch/strategy_4.py - those use different number of features and work fine. Here is copy of my working notebook with running setup: https://github.com/Kismuz/btgym/tree/reserach_dev_strat_4_11/develop_notebook

joaosalvado10 commented 6 years ago

Hello thank for the help, Here is my code. I am running test_btgym.py which is unreal example. If you have some time have a look. https://github.com/joaosalvado10/btgym/tree/master/btgym

Thank you

Kismuz commented 6 years ago

@joaosalvado10, to submit code for review, comments or checking it locally follow general git guidelines:

https://help.github.com/categories/collaborating-with-issues-and-pull-requests/

https://gist.github.com/Chaser324/ce0505fbed06b947d962

joaosalvado10 commented 6 years ago

@Kismuz I was preparing everything to do the pull request then after merging all the stuff I realized that the code works now. Probably there was some update that I did not fetch. Thank you for the help!