Closed JaCoderX closed 5 years ago
'volume' is used on technical analysis ... if there is a reason not to use it?
no other reason except btgym was originally preconfigured for FX data containing no Vol information;
would changing the volume param in parsing_params in the data provider class be enough? or do I have to modify other stuff
it depends on how you intend to use it: if it is only utilised by any indicator in process of estimating already defined input features - than you only need to define computational logic; if you want to pass it (directly or after some preprocessing) to agent - you should modify observation space declaration accordingly; it is true for any additional data ingested.
@Kismuz Thanks
@Kismuz I have tried adding the 'volume' but for some reason it responds as though there is no volume with columns of 'nan' on my data. I have tried looking around for any guidelines but all of them seem to fail.
Can you help with a simple easy example for both scenarios(as indicator and as direct data line)? Or can you direct us to one?
@MorleyMinde, pls. provide code details and traceback; can't judge otherwise.
Helllo @Kismuz. Appologize for the late reply.
Here is my sample code:
params = dict(
# CSV to Pandas params.
sep=';',
header=0,
index_col=0,
parse_dates=True,
names=['open', 'high', 'low', 'close', 'volume'],
# Pandas to BT.feeds params:
timeframe=1, # 1 minute.
datetime=0,
open=1,
high=2,
low=3,
close=4,
volume=5,
# Random-sampling params:
start_weekdays=[0, 1, 2, 3, ], # Only weekdays from the list will be used for episode start.
start_00=True, # Episode start time will be set to first record of the day (usually 00:00).
episode_duration={'days': 2, 'hours': 23, 'minutes': 55},
time_gap={'days': 0, 'hours': 5, 'minutes': 55},
)
MyDataset = BTgymDataset(
filename='data/EURUSD/EURUSD_Candlestick_1_M_2003.csv',
**params,
)
My strategy:
class MyStrategy(BTgymBaseStrategy):
"""
Example subclass of BT server inner computation strategy.
"""
def get_raw_state(self):
"""
Default state observation composer.
Returns:
and updates time-embedded environment state observation as [n,4] numpy matrix, where:
4 - number of signal features == state_shape[1],
n - time-embedding length == state_shape[0] == <set by user>.
Note:
`self.raw_state` is used to render environment `human` mode and should not be modified.
"""
self.raw_state = np.row_stack(
(
np.frombuffer(self.data.open.get(size=self.time_dim)),
np.frombuffer(self.data.high.get(size=self.time_dim)),
np.frombuffer(self.data.low.get(size=self.time_dim)),
np.frombuffer(self.data.close.get(size=self.time_dim)),
np.frombuffer(self.data.volume.get(size=self.time_dim)),
)
).T
return self.raw_state
And here is my environment:
env = BTgymEnv(
dataset = MyDataset,
strategy = MyStrategy,
drawdown_call=50,
state_shape=dict(raw=spaces.Box(low=0,high=1,shape=(30,5))),
port=5555,
verbose=1,)
@MorleyMinde, since you have not included exact traceback message info and example of records contained in EURUSD_Candlestick_1_M_2003.csv
I can only guess that your data failed to be parsed correctly when loaded from file.
@Kismuz Let me provide more details. I have just changed the file ('EURUSD_Candlestick_1_M_2003.csv') to a file in the btgym repository ('btgym/examples/data/DAT_ASCII_EURUSD_M1_2016.csv').
Here is the error message I am getting
State observation shape/range mismatch!
Space set by env:
raw:
Box(30, 5), low: 1.03522, high: 1.1616
Space returned by server:
raw:
array of shape: (30, 5), low: nan, high: nan
Full response:
{'raw': array([[1.12954, 1.12955, 1.12945, 1.12947, nan],
[1.12946, 1.12952, 1.12946, 1.1295 , nan],
[1.1295 , 1.1295 , 1.12948, 1.12948, nan],
[1.12949, 1.12963, 1.12948, 1.12963, nan],
[1.12962, 1.12963, 1.12954, 1.12954, nan],
[1.12955, 1.12956, 1.12954, 1.12956, nan],
[1.12955, 1.12955, 1.12953, 1.12954, nan],
[1.12955, 1.12955, 1.12951, 1.12951, nan],
[1.1295 , 1.12952, 1.12946, 1.12946, nan],
[1.12946, 1.12949, 1.12936, 1.12936, nan],
[1.12937, 1.12941, 1.12933, 1.12934, nan],
[1.12935, 1.12935, 1.12935, 1.12935, nan],
[1.12935, 1.12941, 1.12935, 1.12938, nan],
[1.12938, 1.12939, 1.12937, 1.12937, nan],
[1.12937, 1.12939, 1.12937, 1.12939, nan],
[1.12937, 1.1294 , 1.12934, 1.1294 , nan],
[1.1294 , 1.1294 , 1.12939, 1.12939, nan],
[1.12939, 1.12939, 1.12927, 1.12929, nan],
[1.12929, 1.12929, 1.12928, 1.12928, nan],
[1.12927, 1.12929, 1.12927, 1.12929, nan],
[1.12928, 1.12929, 1.12928, 1.12929, nan],
[1.12929, 1.12931, 1.12927, 1.1293 , nan],
[1.1293 , 1.12931, 1.1293 , 1.1293 , nan],
[1.1293 , 1.1293 , 1.12924, 1.12924, nan],
[1.12924, 1.12924, 1.1292 , 1.12921, nan],
[1.1292 , 1.1292 , 1.12916, 1.12917, nan],
[1.12918, 1.1292 , 1.1291 , 1.12912, nan],
[1.1291 , 1.12915, 1.12904, 1.12915, nan],
[1.12916, 1.12925, 1.12916, 1.12925, nan],
[1.12927, 1.12928, 1.12919, 1.12921, nan]])}
Reward: 0.0
Done: False
Info:
{'step': 0, 'time': datetime.datetime(2016, 4, 21, 0, 29), 'action': {'default_asset': 'hold', '_skip_this': True}, 'broker_message': '_', 'broker_cash': 100.0, 'broker_value': 100.0, 'drawdown': 0.0, 'max_drawdown': 0.0}
Hint: Wrong Strategy.get_state() parameters?
In the file the volume column contains zeros which was what I was expecting but I am getting 'nan'.
@MorleyMinde,
parsing_params
should be set as dictionary on its own, see: https://kismuz.github.io/btgym/btgym.datafeed.html#btgym.datafeed.derivative.BTgymDataset
get_raw_state()
method as it serves more for internal purposes; one should declare own observation modality instead (note state_shape
dict.) and make relevant method with same name: get_<my_own_state_modality_name>_state()
See below for working snippet (except changed filename):
from btgym import BTgymDataset, BTgymBaseStrategy, BTgymEnv
from gym import spaces
import numpy as np
params = dict(
parsing_params=dict(
# CSV to Pandas params.
sep=';',
header=0,
index_col=0,
parse_dates=True,
names=['open', 'high', 'low', 'close', 'volume'],
# Pandas to BT.feeds params:
timeframe=1, # 1 minute.
datetime=0,
open=1,
high=2,
low=3,
close=4,
volume=5,
openinterest=-1,
),
# Random-sampling params:
start_weekdays=[0, 1, 2, 3, ], # Only weekdays from the list will be used for episode start.
start_00=True, # Episode start time will be set to first record of the day (usually 00:00).
episode_duration={'days': 2, 'hours': 21, 'minutes': 55},
time_gap={'days': 0, 'hours': 5, 'minutes': 55},
)
MyDataset = BTgymDataset(
filename='./data/DAT_ASCII_EURUSD_M1_2003.csv',
**params,
)
class MyStrategy(BTgymBaseStrategy):
"""
Example subclass of BT server inner computation strategy.
"""
def get_my_state(self):
my_state = np.row_stack(
(
np.frombuffer(self.data.open.get(size=self.time_dim)),
np.frombuffer(self.data.high.get(size=self.time_dim)),
np.frombuffer(self.data.low.get(size=self.time_dim)),
np.frombuffer(self.data.close.get(size=self.time_dim)),
np.frombuffer(self.data.volume.get(size=self.time_dim)),
)
).T
return my_state
env = BTgymEnv(
dataset = MyDataset,
strategy = MyStrategy,
drawdown_call=50,
state_shape=dict(
raw=spaces.Box(low=0,high=1,shape=(30,4)),
my=spaces.Box(low=-100,high=100,shape=(30,5)),
),
port=5555,
verbose=1,
)
@Kismuz This worked perfectly. Thanks alot.
I went over the code to understand how BTGym is referencing the 'volume' part of the data. by default it seem that the column is ignored and is not used on datafeeds.
'volume' is used on technical analysis as part of the decision making process. so it made me wonder if there is a reason not to use it?
If I do want to use it, would changing the volume param in
parsing_params
in the data provider class be enough? or do I have to modify other stuff as well (like the shape in'external': spaces.Box(...)
) ?