Model learns the opposite direction, worst possible reward

AminHP / gym-anytrading

The most simple, flexible, and comprehensive OpenAI Gym trading environment (Approved by OpenAI Gym)

MIT License

2.1k stars 465 forks source link

Model learns the opposite direction, worst possible reward #13

Closed realiti4 closed 4 years ago

realiti4 commented 4 years ago

Hi, this is not an issue, but after days of trying to figure this out, I wanted to ask in case someone has an advice for me. First I found this issue on my own custom env. I tried DQN, A2C, PPO and all of them are doesn't know which way to go. It just fluctuates between best and worst possible reward. It learns perfectly, because when it is negative it is the worst possible outcome. Then I wanted to try your env which is very clean and easy to understand, but I am having the exact same issue. Do you have any experience with something like this? I'm doing something wrong but couldn't find it. Thanks.

AminHP commented 4 years ago

Hi, can you show me a picture of your results? And I don't exactly understand what you mean by "It learns perfectly but doesn't know which way to go".

realiti4 commented 4 years ago

Hi thank you for response. Sorry if I wasn't clear. What I meant was usually at the start of a training there is a high chance that model decides to maximize towards negative rewards. Sometimes this also happens later in training when model is doing great towards positive rewards and suddenly it flips and tries to maximize negative. I'm not sure what I am doing wrong. Same models are running fine with other problems. I'll rerun and upload my results today.

AminHP commented 4 years ago

I had the same issue once and honestly I don't exactly know how to fix it. Maybe a bad reward function or lack of useful features leads to this issue. Or maybe your model is much/less complicated than what is actually needed. Even it can be something inside your neural network (like activation functions or other parameters) that causes this issue.

realiti4 commented 4 years ago

Thanks. I'll continue investigating. If I find something, I'll update in case someone might find it useful.