Kismuz / btgym

Scalable, event-driven, deep-learning-friendly backtesting library
https://kismuz.github.io/btgym/
GNU Lesser General Public License v3.0
981 stars 259 forks source link

Error encountered on 'Unreal Stacked Lstm example' #80

Closed JaCoderX closed 5 years ago

JaCoderX commented 5 years ago

I encountered an error and I'm trying to figure if it is wrong setting on my end or possible bug.

I took the 'Unreal Stacked Lstm example' and set the trainer to work with PPO. on the strategy params I made the following changes:

MyCerebro.addstrategy(
    ...
    skip_frame=60, # skip_frame_period <= avg_period <= time_embedding_period:
    time_dim = 128,
    avg_period = 100,
    ...

I'm experimenting with skip_frame and time_dim to try and compare two models that are trained with different time frames. with the following settings:

With this setting the models are making decisions at the same time (every hour) but after learning 2 different representation of the same data. I'm curious in seeing if the '1 min model' can learn also the representation of the '1 hour model'

I get this error only after changing to the above values (no error on the default values: skip_frame=10, time_dim=30, avg_period=20)

this is the error:

Traceback (most recent call last): File "/home/jack/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call return fn(*args) File "/home/jack/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/jack/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: PPO/model/advantage [[{{node PPO/model/advantage}} = HistogramSummary[T=DT_FLOAT, _device="/job:worker/replica:0/task:0/device:CPU:0"](PPO/model/advantage/tag, _recv_PPO/PPO/on_policy_advantage_pl_0)]] [[{{node PPO/clip_by_global_norm/mul_47_S555}} = _Recv[client_terminated=false, recv_device="/job:ps/replica:0/task:0/device:CPU:0", send_device="/job:worker/replica:0/task:0/device:CPU:0", send_device_incarnation=-1203984878618912631, tensor_name="edge_4873_PPO/clip_by_global_norm/mul_47", tensor_type=DT_FLOAT, _device="/job:ps/replica:0/task:0/device:CPU:0"]()]]

Kismuz commented 5 years ago

@JacobHanouna ,

from _everywhere import _everything_needed
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

from btgym.research.misc_utils import EnvRunner

env = BTGymEnv(**my_env_config_kwargs)
data_provider = EnvRunner(env=env)

# Get episode and prepare data,
# here - from train dataset, set sample_type=1 to get test:

obs = data_provider.get_episode(sample_type=0)

image = data_provider.env.render('episode')

data_provider.close()

# Get stream of external observations: 
data = np.concatenate(obs['external'], axis=1)

# See what data looks like for an agent:
plt.figure(num=1, figsize=(14, 8))
plt.title('External data:')
plt.grid(True)
_ = plt.plot(data[-1, :, :])

# Show rendered episode:
plt.figure(num=2, figsize=(22, 30))
plt.title('Episode summary:')
_ = plt.imshow(image)

# See rewards closeup:
r = np.asarray(obs['reward'])
plt.figure(num=3, figsize=(14, 8))
plt.title('Reward:')
_ = plt.plot(r)
plt.grid(True)
JaCoderX commented 5 years ago

while playing with env. parameters it is good practice to run it manually couple of times before launching TF cluster; some inconsistent behaviour can be spotted in a first place. I do it so often I wrote very simple wrapper to collect data; could be used like this:

Thanks for the tip. I will use it :)

It's unclear what caused NaN; I'd recommend trying A3C class with same settings to check if error persists;

OK you are right this is not PPO related. I get the same error also when using the BaseAAC trainer with StackedLstmPolicy.

Tried also using BaseAacPolicy which raise a different error (let me know if you need full traceback)

INFO:tensorflow:Restoring parameters from /home/jack/tmp/test/train/model.ckpt-0
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.NotFoundError'>, Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key AAC/global/conv2d/_layer_1/W/Adam not found in checkpoint
     [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:ps/replica:0/task:0/device:CPU:0"](_recv_save/Const_0_S1, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]] 

All the errors happen only if I change to the above settings on the strategy.

Kismuz commented 5 years ago

@JacobHanouna, as traceback says, it failed to load saved model weights to new graph; usually appears when you have changed tf.graph definition (like switched from PPO to A3C or did some other alterations to graph) and tried to load previously saved model. Discard old checkpoints and start from scratch.

JaCoderX commented 5 years ago

@Kismuz This was tested on clean model (I made sure to delete the result folder after every test)

Kismuz commented 5 years ago

what it tries restoring than? have you got clean manual env. run with strategy settings altered your way?

JaCoderX commented 5 years ago

@JacobHanouna, as traceback says, it failed to load saved model weights to new graph; usually appears when you have changed tf.graph definition (like switched from PPO to A3C or did some other alterations to graph) and tried to load previously saved model. Discard old checkpoints and start from scratch.

You were right, probably didn't clear that result. using BaseAacPolicy give the same error as original.

from btgym.research.misc_utils import EnvRunner

can you share EnvRunner as well (if it is not private of course :) )

another option for orig. error can be: switch back to PPO and alter BaseAAC method _combine_summaries - comment out following line

after commenting out this line I get a different error InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had NaN values [[{{node AAC/VerifyFinite/CheckNumerics}} = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:worker/replica:0/task:0/device:CPU:0"](AAC/global_norm/global_norm)]] INFO:tensorflow:Error reported to Coordinator: <class 'RuntimeError'>, process() exception occurred

Kismuz commented 5 years ago

can you share EnvRunner

it's here, update BTgym:

git pull
pip install --upgrade -e .

Found Inf or NaN global norm. : Tensor had NaN values

well, time to manually check environment run.

JaCoderX commented 5 years ago

I ran it manually but no error

Kismuz commented 5 years ago

@JacobHanouna , can you share exact env. setup code so I can replicate a error?

JaCoderX commented 5 years ago

All env setting are the one used in 'unreal example'. only changed the strategy a bit as follows: (skip_frame, time_dim, avg_period)

# Define strategy and broker account parameters:
MyCerebro.addstrategy(
    DevStrat_4_11,
    start_cash=2000,  # initial broker cash
    commission=0.0001,  # commisssion to imitate spread
    leverage=10.0,
    order_size=2000,  # fixed stake, mind leverage
    drawdown_call=10, # max % to loose, in percent of initial cash
    target_call=10,  # max % to win, same
    skip_frame=60, # skip_frame_period <= avg_period <= time_embedding_period:
    time_dim = 128,
    avg_period = 100,
    gamma=0.99,
    reward_scale=7, # gardient`s nitrox, touch with care!
    state_ext_scale = np.linspace(3e3, 1e3, num=5)
)
Kismuz commented 5 years ago

@JacobHanouna, Well, there is caveat with setting oftime_dim and avg_period. Namely, BTgym API shell needs to infer observation state shape before actual instance of class DevStrat_4_11 is created; that means (taking in account backtrader paradigm that instances of classes are created at actual backtest runtime) observation state shape and all variables it depends upon should be class attributes and therefore can not be set via parameters dictionary (and, consequently, via addstrategy () method). So, redefining skip_frame is fine but for others mentioned it is no easier way than to subclass startegy and explicitly set all class attributes; in your case it could look like this:

from gym import spaces
from btgym import DictSpace

class DevStrat_4_11_prime(DevStrat_4_11):
    time_dim = 128  
    skip_frame = 60
    avg_period = 100
    portfolio_actions = ('hold', 'buy', 'sell', 'close')
    gamma = 0.99  
    state_ext_scale = np.linspace(3e3, 1e3, num=5)
    params = dict(
        # Note: fake `Width` dimension to use 2d conv etc.:
        state_shape=
        {
            'external': spaces.Box(low=-100, high=100, shape=(time_dim, 1, 5), dtype=np.float32),
            'internal': spaces.Box(low=-2, high=2, shape=(avg_period, 1, 6), dtype=np.float32),
            'metadata': DictSpace(
                {
                    'type': spaces.Box(
                        shape=(),
                        low=0,
                        high=1,
                        dtype=np.uint32
                    ),
                    'trial_num': spaces.Box(
                        shape=(),
                        low=0,
                        high=10 ** 10,
                        dtype=np.uint32
                    ),
                    'trial_type': spaces.Box(
                        shape=(),
                        low=0,
                        high=1,
                        dtype=np.uint32
                    ),
                    'sample_num': spaces.Box(
                        shape=(),
                        low=0,
                        high=10 ** 10,
                        dtype=np.uint32
                    ),
                    'first_row': spaces.Box(
                        shape=(),
                        low=0,
                        high=10 ** 10,
                        dtype=np.uint32
                    ),
                    'timestamp': spaces.Box(
                        shape=(),
                        low=0,
                        high=np.finfo(np.float64).max,
                        dtype=np.float64
                    ),
                }
            )
        },
        cash_name='default_cash',
        asset_names=['default_asset'],
        start_cash=None,
        commission=None,
        leverage=1.0,
        drawdown_call=5,
        target_call=19,
        portfolio_actions=portfolio_actions,
        initial_action=None,
        initial_portfolio_action=None,
        skip_frame=skip_frame,
        gamma=gamma,
        reward_scale=1.0,
        state_ext_scale=state_ext_scale,  # EURUSD
        state_int_scale=1.0,
        metadata={},
    )
...............
MyCerebro.addstrategy(
    DevStrat_4_11_prime,
    start_cash=2000,  
    commission=0.0001,  
    leverage=10.0,
    order_size=2000,  
    drawdown_call=10, 
    target_call=10,  
    skip_frame=60, 
    gamma=0.99,
    reward_scale=7, 
)

ugly, I know - i'll try to address it in future;

JaCoderX commented 5 years ago

OK Thanks :)

JaCoderX commented 5 years ago

@Kismuz, I came across another issue while playing around with this example.

Under the trainer_config I have enabled use_value_replay=True and got the following error:

File "/home/jack/btgym/btgym/algorithms/policy/base.py", line 373, in get_pc_target feeder = {self.pc_change_state_in: state['external'], self.pc_change_last_state_in: last_state['external']} AttributeError: 'AacStackedRL2Policy' object has no attribute 'pc_change_state_in'

I checked the code and on StckedLstmPolicy you have disabled Aux 1 - Pixel Control. So pc_change_state_in doesn't get declared... but it appears that base is expecting it.

Kismuz commented 5 years ago

@JacobHanouna, fixed, update package; thanks for spotting it out.