Closed JaCoderX closed 5 years ago
@JacobHanouna ,
from _everywhere import _everything_needed
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
from btgym.research.misc_utils import EnvRunner
env = BTGymEnv(**my_env_config_kwargs)
data_provider = EnvRunner(env=env)
# Get episode and prepare data,
# here - from train dataset, set sample_type=1 to get test:
obs = data_provider.get_episode(sample_type=0)
image = data_provider.env.render('episode')
data_provider.close()
# Get stream of external observations:
data = np.concatenate(obs['external'], axis=1)
# See what data looks like for an agent:
plt.figure(num=1, figsize=(14, 8))
plt.title('External data:')
plt.grid(True)
_ = plt.plot(data[-1, :, :])
# Show rendered episode:
plt.figure(num=2, figsize=(22, 30))
plt.title('Episode summary:')
_ = plt.imshow(image)
# See rewards closeup:
r = np.asarray(obs['reward'])
plt.figure(num=3, figsize=(14, 8))
plt.title('Reward:')
_ = plt.plot(r)
plt.grid(True)
while playing with env. parameters it is good practice to run it manually couple of times before launching TF cluster; some inconsistent behaviour can be spotted in a first place. I do it so often I wrote very simple wrapper to collect data; could be used like this:
Thanks for the tip. I will use it :)
It's unclear what caused NaN; I'd recommend trying A3C class with same settings to check if error persists;
OK you are right this is not PPO related. I get the same error also when using the BaseAAC
trainer with StackedLstmPolicy
.
Tried also using BaseAacPolicy
which raise a different error (let me know if you need full traceback)
INFO:tensorflow:Restoring parameters from /home/jack/tmp/test/train/model.ckpt-0
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key AAC/global/conv2d/_layer_1/W/Adam not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:ps/replica:0/task:0/device:CPU:0"](_recv_save/Const_0_S1, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
All the errors happen only if I change to the above settings on the strategy.
@JacobHanouna, as traceback says, it failed to load saved model weights to new graph; usually appears when you have changed tf.graph definition (like switched from PPO to A3C or did some other alterations to graph) and tried to load previously saved model. Discard old checkpoints and start from scratch.
@Kismuz This was tested on clean model (I made sure to delete the result folder after every test)
what it tries restoring than? have you got clean manual env. run with strategy settings altered your way?
BaseAAC
method _combine_summaries
" - comment out following line:
tf.summary.histogram('advantage', self.local_network.on_pi_adv_target),
@JacobHanouna, as traceback says, it failed to load saved model weights to new graph; usually appears when you have changed tf.graph definition (like switched from PPO to A3C or did some other alterations to graph) and tried to load previously saved model. Discard old checkpoints and start from scratch.
You were right, probably didn't clear that result. using BaseAacPolicy
give the same error as original.
from btgym.research.misc_utils import EnvRunner
can you share EnvRunner
as well (if it is not private of course :) )
another option for orig. error can be: switch back to PPO and alter BaseAAC method _combine_summaries - comment out following line
after commenting out this line I get a different error
InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had NaN values [[{{node AAC/VerifyFinite/CheckNumerics}} = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:worker/replica:0/task:0/device:CPU:0"](AAC/global_norm/global_norm)]] INFO:tensorflow:Error reported to Coordinator: <class 'RuntimeError'>, process() exception occurred
can you share EnvRunner
it's here, update BTgym:
git pull
pip install --upgrade -e .
Found Inf or NaN global norm. : Tensor had NaN values
well, time to manually check environment run.
I ran it manually but no error
@JacobHanouna , can you share exact env. setup code so I can replicate a error?
All env setting are the one used in 'unreal example'. only changed the strategy a bit as follows: (skip_frame, time_dim, avg_period)
# Define strategy and broker account parameters:
MyCerebro.addstrategy(
DevStrat_4_11,
start_cash=2000, # initial broker cash
commission=0.0001, # commisssion to imitate spread
leverage=10.0,
order_size=2000, # fixed stake, mind leverage
drawdown_call=10, # max % to loose, in percent of initial cash
target_call=10, # max % to win, same
skip_frame=60, # skip_frame_period <= avg_period <= time_embedding_period:
time_dim = 128,
avg_period = 100,
gamma=0.99,
reward_scale=7, # gardient`s nitrox, touch with care!
state_ext_scale = np.linspace(3e3, 1e3, num=5)
)
@JacobHanouna,
Well, there is caveat with setting oftime_dim
and avg_period
. Namely, BTgym API shell needs to infer observation state shape before actual instance of class DevStrat_4_11
is created; that means (taking in account backtrader paradigm that instances of classes are created at actual backtest runtime) observation state shape and all variables it depends upon should be class attributes and therefore can not be set via parameters
dictionary (and, consequently, via addstrategy ()
method).
So, redefining skip_frame
is fine but for others mentioned it is no easier way than to subclass startegy and explicitly set all class attributes; in your case it could look like this:
from gym import spaces
from btgym import DictSpace
class DevStrat_4_11_prime(DevStrat_4_11):
time_dim = 128
skip_frame = 60
avg_period = 100
portfolio_actions = ('hold', 'buy', 'sell', 'close')
gamma = 0.99
state_ext_scale = np.linspace(3e3, 1e3, num=5)
params = dict(
# Note: fake `Width` dimension to use 2d conv etc.:
state_shape=
{
'external': spaces.Box(low=-100, high=100, shape=(time_dim, 1, 5), dtype=np.float32),
'internal': spaces.Box(low=-2, high=2, shape=(avg_period, 1, 6), dtype=np.float32),
'metadata': DictSpace(
{
'type': spaces.Box(
shape=(),
low=0,
high=1,
dtype=np.uint32
),
'trial_num': spaces.Box(
shape=(),
low=0,
high=10 ** 10,
dtype=np.uint32
),
'trial_type': spaces.Box(
shape=(),
low=0,
high=1,
dtype=np.uint32
),
'sample_num': spaces.Box(
shape=(),
low=0,
high=10 ** 10,
dtype=np.uint32
),
'first_row': spaces.Box(
shape=(),
low=0,
high=10 ** 10,
dtype=np.uint32
),
'timestamp': spaces.Box(
shape=(),
low=0,
high=np.finfo(np.float64).max,
dtype=np.float64
),
}
)
},
cash_name='default_cash',
asset_names=['default_asset'],
start_cash=None,
commission=None,
leverage=1.0,
drawdown_call=5,
target_call=19,
portfolio_actions=portfolio_actions,
initial_action=None,
initial_portfolio_action=None,
skip_frame=skip_frame,
gamma=gamma,
reward_scale=1.0,
state_ext_scale=state_ext_scale, # EURUSD
state_int_scale=1.0,
metadata={},
)
...............
MyCerebro.addstrategy(
DevStrat_4_11_prime,
start_cash=2000,
commission=0.0001,
leverage=10.0,
order_size=2000,
drawdown_call=10,
target_call=10,
skip_frame=60,
gamma=0.99,
reward_scale=7,
)
ugly, I know - i'll try to address it in future;
OK Thanks :)
@Kismuz, I came across another issue while playing around with this example.
Under the trainer_config
I have enabled use_value_replay=True
and got the following error:
File "/home/jack/btgym/btgym/algorithms/policy/base.py", line 373, in get_pc_target feeder = {self.pc_change_state_in: state['external'], self.pc_change_last_state_in: last_state['external']} AttributeError: 'AacStackedRL2Policy' object has no attribute 'pc_change_state_in'
I checked the code and on StckedLstmPolicy
you have disabled Aux 1 - Pixel Control. So pc_change_state_in
doesn't get declared... but it appears that base
is expecting it.
@JacobHanouna, fixed, update package; thanks for spotting it out.
I encountered an error and I'm trying to figure if it is wrong setting on my end or possible bug.
I took the 'Unreal Stacked Lstm example' and set the trainer to work with PPO. on the strategy params I made the following changes:
I'm experimenting with
skip_frame
andtime_dim
to try and compare two models that are trained with different time frames. with the following settings:First model - one year data of 1 min resolution, skip_frame ~ one hour (60 frames), time_dim ~ 2 hours (or above)
Second model one year data of 1 hour resolution, skip_frame ~ one hour (1 frame), time_dim ~ 120 hours (or above)
With this setting the models are making decisions at the same time (every hour) but after learning 2 different representation of the same data. I'm curious in seeing if the '1 min model' can learn also the representation of the '1 hour model'
I get this error only after changing to the above values (no error on the default values: skip_frame=10, time_dim=30, avg_period=20)
this is the error: