Cattharine / product_owner_rl

0 stars 2 forks source link

Bound reward system values #30

Closed krutovsky-danya closed 5 months ago

krutovsky-danya commented 7 months ago

It could be useful to bound rewards within the [-1;1] range. Now you can find values such as -100 or 50. This was done to highly motivate the agent to do or not to do actions. These values are used in the MSE loss function, which could lead to gradient explosions and break the learning process.