PacktPublishing / Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
MIT License
2.83k stars 1.29k forks source link

Chapter 8: run_model #4

Closed izabael closed 5 years ago

izabael commented 5 years ago

I'm confused about this code for two reasons: 1) Is position_steps actually used for something? It's always None and seems like it does nothing. 2) this ties into my second question, but when I run the code and add some print statements to see what it is choosing as actions (0,1,2), it seems that it can buy as many shares as it wants at a time? How can I change the code so that it only buys 1 share at a time as with the training code?

[I MUST add that I love this book and your code examples. I've spent so much enjoyable time working through the Atari implementations especially. This book is becoming a bible to me.]

Thank you!

Shmuma commented 5 years ago

Hi!

  1. variable position_steps is a leftover from some experiment I did when I developed the example. In current version it is meaningless and it will be removed.

  2. the agent can issue as many Buy actions as it wants, but the environment ignores such actions if agent already entered the market. This check is implemented here: https://github.com/PacktPublishing/Deep-Reinforcement-Learning-Hands-On/blob/master/Chapter08/lib/environ.py#L92

I haven't checked, but from the common sense, agent should learn this aspect and stop sending Buy action for the second time (the flag which indicates presence of the order is provided with observation). But in the beginning of training, agent could send the action many times, but only the first will be taken into consideration. Of course, this example is very basic and could be extended by making more sophisticated environment model with stop losses, take profit, margin calls, short orders, etc.

Thanks for you interest to the book!