[Question] Is having a high reward and low profit a normal case?

AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

MIT License

2.1k stars 465 forks source link

[Question] Is having a high reward and low profit a normal case? #8

Closed toksis closed 4 years ago

toksis commented 4 years ago

Hello,

Is reward calculation ok? I have a high reward but on a loss profit. Reward

I am using stable baselines.

I am using this signal features.


def my_process_data(env):
    start = env.frame_bound[0] - env.window_size
    end = env.frame_bound[1]
    prices = env.df.loc[:, 'Close'].to_numpy()[start:end]
    # print(env.df)
    indi = Indicators(env.df)
    signal_features = env.df.loc[:, ['Close', 'Open', 'High', 'Low','Volume']].to_numpy()[start+1:end]
    #signal_features = env.df.loc[:, ['Close','Volume']].to_numpy()[start+1:end]

    rsi = indi.rsi(5,1)
    rsicolumn = rsi.to_numpy()[start:end].reshape(-1,1)
    print("rsi shape: ",rsicolumn.shape)
    signal_features = np.append(signal_features, rsicolumn, axis=1)

    # print(signal_features)
    return prices, signal_features

AminHP commented 4 years ago

Did you change the unit_side?

toksis commented 4 years ago

Hello, I did not change the unit side. Only the signal features.

AminHP commented 4 years ago

Try to change its value to right and show me the result.

toksis commented 4 years ago

I am running it now. I will update you later.

toksis commented 4 years ago

Training Done
rsi shape:  (6224, 1)
C:\anaconda\envs\StableBaselines\lib\site-packages\gym\logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
m
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
info: {'total_reward': 3.8000000000004697, 'total_profit': 0.9996196706347698, 'position': 0}
ploting renderall

env_maker = lambda: MyForexEnv(df=FOREX_EURUSD_1H_ASK,unit_side = 'right', window_size=5, frame_bound=(5, len(FOREX_EURUSD_1H_ASK)))

AminHP commented 4 years ago

There is a fact you may notice. Maximizing the reward does not necessarily increase the profit. Because parameters like trade_fee, etc are not considered in calculating the reward. It is very difficult to find a proper reward calculation method.

Once I tried to use profit as reward, but it didn't get better. Maybe I missed something. But you should look for a good reward function, and that's an essential part of solving such a problem.

toksis commented 4 years ago

Hello, in your code, how will the machine know if the one you buy is the one you sell and when there is a trade? thank you

AminHP commented 4 years ago

Hi!

What do you mean by "the one you buy is the one you sell"? Can you explain it a bit more?

Here you can find if there is a trade.

toksis commented 4 years ago

I mean your Trade. You buy forex NZD/USD and at .65 then you sell it at .69 to close your position.

AminHP commented 4 years ago

I'm not sure that I understand what you mean correctly. But if you are asking how the machine knows which buy action is a trade, the answer is:

If you check out the source code here, you will see that the actual trade only happens when the position changes. So, having like 100 buy actions in a row doesn't make 100 trades.

toksis commented 4 years ago

Thank you. So the multi buy action is a hold. Then when there is a Sell, it will close a trade. So there is only one buy and sell trade per session? Is that it? thanks

AminHP commented 4 years ago

Yes, it is like you said.

toksis commented 4 years ago

Thank you.