Open tkay264 opened 7 months ago
thanks for your interests for this project. you can list all states and actions, and then calculate the reward again. if the reward is 0, i guess the action is hold, no buying or selling.
@zhumingpassional Are you suggesting that this output:
---------------------------------------
| time/ | |
| fps | 328 |
| iterations | 27 |
| time_elapsed | 336 |
| total_timesteps | 110592 |
| train/ | |
| approx_kl | 0.0 |
| clip_fraction | 0 |
| clip_range | 0.2 |
| entropy_loss | -2.17e-18 |
| explained_variance | 0 |
| learning_rate | 0.125 |
| loss | 1.31 |
| n_updates | 260 |
| policy_gradient_loss | -3.22e-09 |
| reward | 0.0 |
| value_loss | 2.84 |
---------------------------------------
the reward parameter is strictly for that step in time and not related to the current performance of the model?
Hello,
Thank you for creating the library, and I appreciate your excellent work.
I've been experimenting with the Stable Baseline 3 DDPG and TD3 Models. When I run the training script, I'm experiencing unpredictable rewards, sometimes they calculate correctly, and other times they stay at 0. If I stop and rerun the script, the rewards may still be 0. Could you clarify if this is an inherent issue with the models or if I should review the code?
I'm using one week of 1-minute data for a single stock in my training.
Best regards