Reward always 0 - Githubissues

MorleyMinde commented 6 years ago

Hello @Kismuz There is something I am experiencing with the environment where as at each step when taking an action on the environment the reward is always 0. I am not sure this is a bug or I am just missing something

Here is my initialization code

env = BTgymEnv(
            filename='./btgym/examples/data/DAT_ASCII_EURUSD_M1_2010.csv',
            start_weekdays=[0, 1],
            episode_duration={'days': 2, 'hours': 0, 'minutes': 0},
            strategy=MyStrategy,
            start_00=True,
            start_cash=self.capital,
            broker_commission=0.002,
            fixed_stake=10,
            drawdown_call=30,
            state_shape={'raw_state': spaces.Box(low=1, high=2, shape=self.input_shape),'indicator_states': spaces.Box(low=-1, high=100, shape=self.input_shape)},
            port=5002,
            data_port=4800,
            verbose=1,)

Here is how it is used

done = False
    while not done:
        action = agent.act(state)
        next_state, reward, done, info = env.step(action) #The reward here is always 0 regardless of the action I take
        agent.remember(state, action, reward, next_state, done)
        state = next_state

Expected behaviour:

I expect the reward to be varying in that it should be negative or positive or zero occationally.

Actual behaviour:

The reward is always 0 even when I put all the actions within one episode to be a single one.

Please help me out here

Thanks in advance.

Kismuz commented 6 years ago

@MorleyMinde ,

ensure your agent is choosing actions other than 'hold'. If it doesn't open and close positions - reward will be zero. Simplest way to check it is to make randomly acting agent:
```
action = env.action_space.sample()  # samples random action from action space
```
this is also useful for sanity check of your broker setup: randomly acting agent should be able to drain your trading account in at least ~2/3 of maximum episode duration; review 'info' part of the response for additional insights; See also: https://github.com/Kismuz/btgym/blob/master/examples/very_basic_env_setup.ipynb
It may be the case when actual sampled episode duration is too small due to inconsistent episode/dataset settings. Setting verbose=2 and reviewing sampling logs can help identify it.

MorleyMinde commented 6 years ago

I wrote simple code with the basic example as follows:

with open("log.txt", "a") as myfile:
            env = BTgymEnv(filename='./btgym/examples/data/DAT_ASCII_EURUSD_M1_2016.csv')

            done = False

            o = env.reset()

            while not done:
                action = env.action_space.sample()
                obs, reward, done, info = env.step(action)
                myfile.write('action: {},reward: {},info: {}\n'.format(action, reward, info))

Here is the result I am getting (I cant post the entire log but here are the last couple of lines):

action: 0,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1409, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 23), 'action': 'buy', 'broker_message': 'New BUY created; ORDER FAILED with status: Margin'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1410, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 24), 'action': 'hold', 'broker_message': 'ORDER FAILED with status: Margin'}]
action: 1,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1411, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 25), 'action': 'close', 'broker_message': 'New CLOSE created; -'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1412, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 26), 'action': 'buy', 'broker_message': 'New BUY created; -'}]
action: 1,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1413, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 27), 'action': 'close', 'broker_message': 'New CLOSE created; ORDER FAILED with status: Margin'}]
action: 2,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1414, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 28), 'action': 'buy', 'broker_message': 'New BUY created; -'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1415, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 29), 'action': 'sell', 'broker_message': 'New SELL created; ORDER FAILED with status: Margin'}]
action: 1,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1416, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 30), 'action': 'close', 'broker_message': 'New CLOSE created; ORDER FAILED with status: Margin'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1417, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 31), 'action': 'buy', 'broker_message': 'New BUY created; -'}]
action: 0,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1418, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 32), 'action': 'close', 'broker_message': 'New CLOSE created; ORDER FAILED with status: Margin'}]
action: 0,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1419, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 33), 'action': 'hold', 'broker_message': '-'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1420, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 34), 'action': 'hold', 'broker_message': '-'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1421, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 35), 'action': 'close', 'broker_message': 'New CLOSE created; -'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1422, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 36), 'action': 'close', 'broker_message': 'New CLOSE created; -'}]
action: 3,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1423, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 37), 'action': 'close', 'broker_message': 'New CLOSE created; -'}]
action: 1,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1424, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 38), 'action': 'close', 'broker_message': 'New CLOSE created; -'}]
action: 2,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1425, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 39), 'action': 'buy', 'broker_message': 'New BUY created; -'}]
action: 1,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1426, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 40), 'action': 'sell', 'broker_message': 'New SELL created; ORDER FAILED with status: Margin'}]
action: 0,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1427, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 41), 'action': 'buy', 'broker_message': 'New BUY created; ORDER FAILED with status: Margin'}]
action: 0,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1428, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 42), 'action': 'hold', 'broker_message': 'ORDER FAILED with status: MarginEND OF DATA'}]
action: 1,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1429, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 43), 'action': 'hold', 'broker_message': 'CLOSE, END OF DATA'}]
action: 0,reward: 0.0,info: [{'drawdown': 0.0, 'max_drawdown': 0.0, 'step': 1430, 'broker_cash': 10.0, 'broker_value': 10.0, 'time': datetime.datetime(2016, 12, 6, 5, 44), 'action': 'buy', 'broker_message': 'CLOSE, END OF DATA'}]

The reward is 0 through out regardless of the action and the cash is still 10.

Thanks

Kismuz commented 6 years ago

@MorleyMinde , please review broker messages: 'broker_message': 'New CLOSE created; ORDER FAILED with status: Margin'} so it is:

Broker account settings (initial amount, size of stake, leverage) should be consistent for broker simulator to allow any order execution.

In basic example it not set enough cash to perform any operations, no order can be executed. Look at other examples notebook with realistic account settings, like this:

MyCerebro.broker.setcash(2000)
MyCerebro.broker.setcommission(commission=0.0001, leverage=10.0) # commisssion to imitate spread
MyCerebro.addsizer(bt.sizers.SizerFix, stake=5000,)

Se also: #35 on broker account setting.

Should been mentioned it in initial reply; haven't noticed there is no changes to account, sorry.

MorleyMinde commented 6 years ago

Totally worked. Should go through the documentation carefully. Thanks alot.

Kismuz / btgym

Reward always 0 #36

Expected behaviour:

Actual behaviour: