03_DQN.ipynb - ease of use improvements

I wanted to make two more suggestions about the 03_DQN.ipynb notebook:

1) The default size of the replay buffer uses a lot of memory and this caused me a few out of memory problems. I have found a 10k rather than 100K replay buffer is fine and still converges nicely. Also lets you keep all the states on my 4GB 3050 laptop GPU which is nice.

2) In order to get the test for the QNetwork to pass you have to use no biases in the first conv2D layer. I had to cheat and look at the solution to figure that out. I am still not really sure why we don't use a bias in that layer!

I hope these comments are helpful - I am really happy with these notebooks and found them very useful even without youtube videos!

alessiodm / drl-zh

03_DQN.ipynb - ease of use improvements #4