Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
984 stars 260 forks source link

How to use 'volume' #81

Closed JaCoderX closed 5 years ago

JaCoderX commented 5 years ago

I went over the code to understand how BTGym is referencing the 'volume' part of the data. by default it seem that the column is ignored and is not used on datafeeds.

'volume' is used on technical analysis as part of the decision making process. so it made me wonder if there is a reason not to use it?

If I do want to use it, would changing the volume param in parsing_params in the data provider class be enough? or do I have to modify other stuff as well (like the shape in 'external': spaces.Box(...)) ?

Kismuz commented 5 years ago

'volume' is used on technical analysis ... if there is a reason not to use it?

no other reason except btgym was originally preconfigured for FX data containing no Vol information;

would changing the volume param in parsing_params in the data provider class be enough? or do I have to modify other stuff

it depends on how you intend to use it: if it is only utilised by any indicator in process of estimating already defined input features - than you only need to define computational logic; if you want to pass it (directly or after some preprocessing) to agent - you should modify observation space declaration accordingly; it is true for any additional data ingested.

JaCoderX commented 5 years ago

@Kismuz Thanks

MorleyMinde commented 5 years ago

@Kismuz I have tried adding the 'volume' but for some reason it responds as though there is no volume with columns of 'nan' on my data. I have tried looking around for any guidelines but all of them seem to fail.

Can you help with a simple easy example for both scenarios(as indicator and as direct data line)? Or can you direct us to one?

Kismuz commented 5 years ago

@MorleyMinde, pls. provide code details and traceback; can't judge otherwise.

MorleyMinde commented 5 years ago

Helllo @Kismuz. Appologize for the late reply.

Here is my sample code:

params = dict(
    # CSV to Pandas params.
    sep=';',
    header=0,
    index_col=0,
    parse_dates=True,
    names=['open', 'high', 'low', 'close', 'volume'],

    # Pandas to BT.feeds params:
    timeframe=1,  # 1 minute.
    datetime=0,
    open=1,
    high=2,
    low=3,
    close=4,
    volume=5,

    # Random-sampling params:
    start_weekdays=[0, 1, 2, 3, ],  # Only weekdays from the list will be used for episode start.
    start_00=True,  # Episode start time will be set to first record of the day (usually 00:00).
    episode_duration={'days': 2, 'hours': 23, 'minutes': 55},
    time_gap={'days': 0, 'hours': 5, 'minutes': 55},
)

MyDataset = BTgymDataset(
filename='data/EURUSD/EURUSD_Candlestick_1_M_2003.csv',
**params,
)

My strategy:

class MyStrategy(BTgymBaseStrategy):
"""
Example subclass of BT server inner computation strategy.
"""

def get_raw_state(self):
    """
    Default state observation composer.

    Returns:
         and updates time-embedded environment state observation as [n,4] numpy matrix, where:
            4 - number of signal features  == state_shape[1],
            n - time-embedding length  == state_shape[0] == <set by user>.

    Note:
        `self.raw_state` is used to render environment `human` mode and should not be modified.

    """
    self.raw_state = np.row_stack(
        (
            np.frombuffer(self.data.open.get(size=self.time_dim)),
            np.frombuffer(self.data.high.get(size=self.time_dim)),
            np.frombuffer(self.data.low.get(size=self.time_dim)),
            np.frombuffer(self.data.close.get(size=self.time_dim)),
            np.frombuffer(self.data.volume.get(size=self.time_dim)),
        )
    ).T

    return self.raw_state

And here is my environment:

env = BTgymEnv(
           dataset = MyDataset,
            strategy = MyStrategy,
                     drawdown_call=50,
                     state_shape=dict(raw=spaces.Box(low=0,high=1,shape=(30,5))),
                     port=5555,
                     verbose=1,)
Kismuz commented 5 years ago

@MorleyMinde, since you have not included exact traceback message info and example of records contained in EURUSD_Candlestick_1_M_2003.csv I can only guess that your data failed to be parsed correctly when loaded from file.

MorleyMinde commented 5 years ago

@Kismuz Let me provide more details. I have just changed the file ('EURUSD_Candlestick_1_M_2003.csv') to a file in the btgym repository ('btgym/examples/data/DAT_ASCII_EURUSD_M1_2016.csv').

Here is the error message I am getting

State observation shape/range mismatch!
Space set by env: 

raw:
   Box(30, 5), low: 1.03522, high: 1.1616

Space returned by server: 

raw:
   array of shape: (30, 5), low: nan, high: nan

Full response:
{'raw': array([[1.12954, 1.12955, 1.12945, 1.12947,     nan],
       [1.12946, 1.12952, 1.12946, 1.1295 ,     nan],
       [1.1295 , 1.1295 , 1.12948, 1.12948,     nan],
       [1.12949, 1.12963, 1.12948, 1.12963,     nan],
       [1.12962, 1.12963, 1.12954, 1.12954,     nan],
       [1.12955, 1.12956, 1.12954, 1.12956,     nan],
       [1.12955, 1.12955, 1.12953, 1.12954,     nan],
       [1.12955, 1.12955, 1.12951, 1.12951,     nan],
       [1.1295 , 1.12952, 1.12946, 1.12946,     nan],
       [1.12946, 1.12949, 1.12936, 1.12936,     nan],
       [1.12937, 1.12941, 1.12933, 1.12934,     nan],
       [1.12935, 1.12935, 1.12935, 1.12935,     nan],
       [1.12935, 1.12941, 1.12935, 1.12938,     nan],
       [1.12938, 1.12939, 1.12937, 1.12937,     nan],
       [1.12937, 1.12939, 1.12937, 1.12939,     nan],
       [1.12937, 1.1294 , 1.12934, 1.1294 ,     nan],
       [1.1294 , 1.1294 , 1.12939, 1.12939,     nan],
       [1.12939, 1.12939, 1.12927, 1.12929,     nan],
       [1.12929, 1.12929, 1.12928, 1.12928,     nan],
       [1.12927, 1.12929, 1.12927, 1.12929,     nan],
       [1.12928, 1.12929, 1.12928, 1.12929,     nan],
       [1.12929, 1.12931, 1.12927, 1.1293 ,     nan],
       [1.1293 , 1.12931, 1.1293 , 1.1293 ,     nan],
       [1.1293 , 1.1293 , 1.12924, 1.12924,     nan],
       [1.12924, 1.12924, 1.1292 , 1.12921,     nan],
       [1.1292 , 1.1292 , 1.12916, 1.12917,     nan],
       [1.12918, 1.1292 , 1.1291 , 1.12912,     nan],
       [1.1291 , 1.12915, 1.12904, 1.12915,     nan],
       [1.12916, 1.12925, 1.12916, 1.12925,     nan],
       [1.12927, 1.12928, 1.12919, 1.12921,     nan]])}
Reward: 0.0
Done: False
Info:
{'step': 0, 'time': datetime.datetime(2016, 4, 21, 0, 29), 'action': {'default_asset': 'hold', '_skip_this': True}, 'broker_message': '_', 'broker_cash': 100.0, 'broker_value': 100.0, 'drawdown': 0.0, 'max_drawdown': 0.0}

Hint: Wrong Strategy.get_state() parameters?

In the file the volume column contains zeros which was what I was expecting but I am getting 'nan'.

Kismuz commented 5 years ago

@MorleyMinde,

https://kismuz.github.io/btgym/btgym.datafeed.html#btgym.datafeed.derivative.BTgymDataset

See below for working snippet (except changed filename):

from btgym import BTgymDataset, BTgymBaseStrategy, BTgymEnv
from gym import spaces
import numpy as np

params = dict(
    parsing_params=dict(
    # CSV to Pandas params.
        sep=';',
        header=0,
        index_col=0,
        parse_dates=True,
        names=['open', 'high', 'low', 'close', 'volume'],

        # Pandas to BT.feeds params:
        timeframe=1,  # 1 minute.
        datetime=0,
        open=1,
        high=2,
        low=3,
        close=4,
        volume=5,
        openinterest=-1,
    ),

    # Random-sampling params:
    start_weekdays=[0, 1, 2, 3, ],  # Only weekdays from the list will be used for episode start.
    start_00=True,  # Episode start time will be set to first record of the day (usually 00:00).
    episode_duration={'days': 2, 'hours': 21, 'minutes': 55},
    time_gap={'days': 0, 'hours': 5, 'minutes': 55},
)

MyDataset = BTgymDataset(
filename='./data/DAT_ASCII_EURUSD_M1_2003.csv',
**params,
)

class MyStrategy(BTgymBaseStrategy):
    """
    Example subclass of BT server inner computation strategy.
    """

    def get_my_state(self):
        my_state = np.row_stack(
            (
                np.frombuffer(self.data.open.get(size=self.time_dim)),
                np.frombuffer(self.data.high.get(size=self.time_dim)),
                np.frombuffer(self.data.low.get(size=self.time_dim)),
                np.frombuffer(self.data.close.get(size=self.time_dim)),
                np.frombuffer(self.data.volume.get(size=self.time_dim)),
            )
        ).T

        return my_state

env = BTgymEnv(
    dataset = MyDataset,
    strategy = MyStrategy,
    drawdown_call=50,
    state_shape=dict(
        raw=spaces.Box(low=0,high=1,shape=(30,4)),
        my=spaces.Box(low=-100,high=100,shape=(30,5)),
    ),
    port=5555,
    verbose=1,
)
MorleyMinde commented 5 years ago

@Kismuz This worked perfectly. Thanks alot.