Obs and reward explodes when initializing with unreasonable action with stock size over 50

AI4Finance-Foundation / FinRL-Meta

FinRL-Meta: Dynamic datasets and market environments for FinRL.

MIT License

1.24k stars 576 forks source link

It looks like StockTradingEnv Class in env_stocktrading_China_A_shares.py will give really high values of obs and reward when using extreme actions with over 50 stocks. This is not a problem when the stock size is under 15 even if the initial action choice is extreme. The first picture is the stock size of 50 and the second picture is the stock size of 15. "r" represents the reward, and since it is a batch of data, so I calculate max, min, std, and mean for the batch. image (1) This makes training really hard since obs and rewards are high. I tried to tune the hyperparameters but it is hard to converge. I wonder if any thoughts on being able to consider a bigger stock size.

AI4Finance-Foundation / FinRL-Meta

Obs and reward explodes when initializing with unreasonable action with stock size over 50 #174