using the Stock_NeurIPS2018_SB3.ipynb notebook , default parameters
the results seems to be ok
but after checking the tensorboard log , I found something confused
Why the entropy loss is negative and keep gowing?
The train/reward indicate the agent not learning something useful ?
After checking the actions of the agents action on the trade dataset , the actions almost the same , buy some shares and keep holding....
it may be caused by no normalization. it also depends on the distribution of datasets.
if most stocks are decreasing, the reward may be decreasing
it depends on the trending of stocks and hyper-param tuning. if you set a different training/trading period, e.g., most stocks are decreasing, the result may be different.
using the Stock_NeurIPS2018_SB3.ipynb notebook , default parameters the results seems to be ok but after checking the tensorboard log , I found something confused