AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)
MIT License
2.09k stars 459 forks source link

_calculate_reward modify #87

Closed panjule closed 11 months ago

panjule commented 1 year ago

HI Amin

i try to modify the _calculate_reward function and ep-rew-mean always return 0

even after i modify it with the original code from gym-anytrading...result still 0

def my_calculate_reward(self, action): step_reward = 0 # pip

    trade = False
    if .....(cut it short)
        if self._position == Positions.Short:
            step_reward += -price_diff * 10000
        elif self._position == Positions.Long:
            step_reward += price_diff * 10000

    return step_reward

class MyForexEnv(ForexEnv): _calculate_reward = my_calculate_reward


| rollout/ | | | ep_len_mean | 9.43e+03 | | ep_rew_mean | 0 |

Please be advised

Alex2782 commented 1 year ago

I also had the problems: https://github.com/AminHP/gym-anytrading/pull/86#issuecomment-1483605097

after this change I could train SB3 - models again, tested 'stocks-v0': https://github.com/AminHP/gym-anytrading/pull/86/commits/7288a1e3f7089b477caf846ddcc80b60c3829b7c#diff-5d3f71bdaa90f138b62b611d8a6a0e90090f893152c2e313975a7bd6c43d8238R38

i found other problems, i could not always see learning progress. (tested with interday stock data) https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html

always normalize your observation space when you can, i.e., when you know the boundaries

i have only tried with 'diff' without 'prices' signal_features = diff #np.column_stack((prices, diff))

it works much better, PPO achieves more 'avg. rewards' for me this way sb3_predict