Reinforcement Learning Experiments

Hvass-Labs commented 7 years ago

This thread is for documenting your experiments with Reinforcement Learning in Tutorial 16.

It is also interesting for others to hear if your experiment failed, so they don't have to repeat the same mistakes.

Please write the following for each of your experiments:

English description of the experiment.
List the experimental details including the hyper-parameters you have used, how many states and episodes you have trained, etc.
What are the test-results (and which epsilon value are you using during testing).
Provide a link to your modified code as a Gist on GitHub.

rogaha commented 7 years ago

Thanks for the great work on your tutorials and videos! I had also issues with OpenAI's docs & transparency (https://github.com/openai/universe/issues/146).

Zberenyi commented 7 years ago

Hi Magnus, I have a question about your tutorial "TensorFlow Tutorial #02". (Not sure if this is the right place to ask it - pls feel free to redirect me..) After say 10,000 training steps I wanted to decrease the learning rate to try finer tuning to go above 99% accuracy. But when changing the "optimizer = tf.train.." containing the learning rate the model gave error messages unless I re-initialized the whole model - thus losing the so far optimized coefficients. Sorry for the the dumb question - I am very new to python. Also: once a model is optimized how to export its weights and biases? It would be nice to have the values and incorporate them into a simple "forward model" only - maybe outside python.. Thx and best / Zoltan.

Hvass-Labs commented 6 years ago

This thread is only intended for people to write about their experiments with Reinforcement Learning. It's a pity nobody has written anything yet.

famishedrover commented 6 years ago

@Hvass-Labs Let me begin , I started off with Reinforcement Learning a while ago when i learnt there is something known as OpenAI. I did their first problem of cartPole balancing game. To see my model performing better than me was so amazing.

Now I'm trying to use GANs with Reinforcement Learning in Vision problems.

kaszperro commented 6 years ago

Hi Magnus, Thanks for your tutorials. I've just started experimenting with what you have provided. I'll post something about it in few days or so.
Take care :)

lbrichards commented 6 years ago

Hi Magnus, I have started using your reinforcement learning framework on a simple problem which I devised for my own learning. Unlike the Atari games, it does not take images as input. There can be only 3 possible observations and 3 possible actions. I have encountered a problem with reading the log files, as they are empty, even after a training session. I always get the error:

ValueError: not enough values to unpack (expected 3, got 0)
Process terminated with an exit code of 1

Now I am looking for the calls which are supposed to be writing data to the log file, but I cannot find them so far. Any guidance will be most appreciated. Best regards, Larry

lbrichards commented 6 years ago

Hello again. It seems my problem was not due to the log files at all. The problem was in the shape of my observation, which is [1,1], and this did not match with what the ReplayMemory was expecting. Or, more precisely, it did not match the shape expected by NeuralNetwork.optimize(). For this reason, my program was crashing before ever getting the chance to write to a log file. Now all appears to be working normally, after hard coding the following:

state_shape = [1,1]

ghost commented 6 years ago

Hi Magnus, Thanks for giving real insight into this hyped domain! G.Hinton has second thoughts about his own invention. It appears you agree. DeepMind declares new "victories". Self driving cars soon based on RL. How does this add up? Thanks Tore

gue22 commented 6 years ago

Hey Magnus, thanks for all your great work! I'm trying to execute the RL notebook and would have loved to use the Breakout-v0 checkpoint. I uncommented the d/l line and it worked.

Aside: There is a small bug with only unzipping the d/led file. It claims it unzipped, but it doesn't. I didn't want to strain your server with another d/l for another machine, so I stumbled into that. No big deal.

The real problem is in the next statement. "Failed to restore checkpoint from: checkpoints_tutorial16/Breakout-v0"

I can only suspect this has to do with gym. They somewhere say the v0-versions are no longer compatible with the latest gym. (0.9.4; TF 1.4.0; Python 3.6.3)

Any idea why the restore could fail? Thanks a ton G.

Hvass-Labs commented 6 years ago

@gue22 Thanks for the nice words, I'm glad you found it useful!

I looked into this issue today. There appears to have been a change in OpenAI gym where they have removed the unusued / redundant actions. This means the Neural Network outputs the 'wrong' number of Q-values (6 instead of 4), so it is incompatible with new versions of gym. I added a note to the Python Notebook explaining this. You will either have to install the old versions of gym and atari-py, or you can just train a new model if you have a fast enough GPU. Unfortunately I cannot put new checkpoints online at the moment, and I'm also not sure how many people actually use them and whether it would be worth the effort.

ai-bits commented 6 years ago

Hey @Hvass-Labs , thanks for getting back to me. I appreciate it a lot! Major edit of text! Was too much in a hurry to start training before lunch, so I mixed up the cmds. I started ~~python reinforcement_learning.py --env 'MsPacman-v4' --render --episodes 2000~~ python reinforcement_learning.py --env 'MsPacman-v4' --training on an Skylake NUC ~~an hour ago and it was at episode 150 after an hour~~. NUC doesn´t seem to be strained extraordinarily. ~~What worries me is that I saw sth like "Failed to restore checkpoint..." (now off the buffer) when I started the 2000 episodes training after the initial test with 2.~~ The Breakout-restore-fail with your training data is not that much of a spoiler if I have a valid checkpoint for MsPacman-v4 in the end, because that is more similar to the Eric and the Floaters game I want to tackle. Edit: The checkpoints are there now. Edit 2: Checkpoints worked for continuing training and for play!

The guys at the university want it to be solved with traditional search algos and my first thought was to do it with RL. We´ve got a big beer-bet running and I promise I´ll have some reward for you if I get sth substantial out of here. ;-) (No reply at all on the gym forum over a month. )

Any thoughts on how to apply the gym-approach to the Java game server. (I have the code.) Universe (ignoring the code)?

Thanks again! G.

Hvass-Labs commented 6 years ago

@ai-bits yes your text was a bit confusing :-) I'm glad it's working now and I'll be curious to hear how it went! You can write your experience here and also put a youtube video up with some gameplay footage. I use vokoscreen to record the screen on linux.

ai-bits commented 6 years ago

Hey @Hvass-Labs , results look good and so I´ll have a MsPacman-v4 24h training checkpoint from a Skylake NUC using the as of early 2018 latest gym 0.9.4, AVX2-optimized TensorFlow 1.4.1 and Python 3.6.3.

The real point to ask is: Should I offer a MsPacman-v4 checkpoint done with gym 0.9.4 with some days worth of training, because of "To keep using the old [< v4] environments, keep gym <= 0.8.2 and atari-py <= 0.0.21." Quote from https://github.com/openai/gym#what-s-new

Wadayathink? Cheers G.

Hvass-Labs commented 6 years ago

@ai-bits You can make your own little github repo with your experiments. You just copy and modify the code you need from my tutorials and provide a link back to my original code. Then you can do as you please with the code :-)

ai-bits commented 6 years ago

@Hvass-Labs OK, we´re getting in the flow. ;-) ToDo:

Minor gotcha: reinforcement_learning.py is misspelled with a hyphen instead of an underscore in the comments / examples inside the file.

I´ll put sth in ai-bits / Google Drive (description / latest MsPacman-v4 checkpoint) and a backlink to your github repo once, until end of January, I solved the

Eric and the Floaters- (Bomberman-) -like beer-ware game As in more detail above: I want to apply RL to it (instead of hand-coded algos) and getting gym Pacman to work was the prerequisite.

Thanks in advance for any hints! G.

koushikam commented 6 years ago

Hi !

I am new to Deep Q learning. I have downloaded the code and I am using it for one of my research project. As shown in the picture. I want to move the white square to a perfect position which maximizes the reward in terms of Signal to noise ratio. The white line indicates the those square nodes form a pair and the white node should maintain a closest distance with that node rather than other neighboring nodes. I generated the motion traced image as well as grey scale image for the input. The actions are: {east, west, north , south, stay}

greyscale

My input image size is 200X200.

The problem is that I am getting 100% error in training and the performance of the training is not improving over several training stages with the modified learning rate. The training output looks like this Replay-memory statistics: Q-values Before, Min: -0.01, Mean: 0.00, Max: 0.02 Q-values After, Min: -44.10, Mean: -0.60, Max: 0.02 Q-values Diff., Min: -44.11, Mean: -0.60, Max: 0.00 Number of large errors > 0.1: 20746 / 20747 (100.0%) end_life: 0.0%, end_episode: 0.0%, reward non-zero: 100.0% Optimizing Neural Network to better estimate Q-values ... Learning-rate: 1.0e-03 Loss-limit: 0.100 Max epochs: 5.0 Iteration: 79 (0.49 epoch), Batch loss: 735.4393, Mean loss: 731.3398

Hvass-Labs commented 6 years ago

@koushikam I haven't seen your post before now, but I don't know how to solve it anyway :-)

Hvass-Labs commented 6 years ago

I've decided to close this thread. There's no need to keep it open as it's not very active.

If people in the future have a general question about Reinforcement Learning then it's best to ask on StackOverflow.

Shivam2230 commented 5 years ago

No link found for checkpoint download of breakout......please suggest any link as described in your video

mellertson commented 5 years ago

I've very much enjoyed reading your tutorial and watching your video on this topic. Thanks very much for sharing this valuable information.

I can't find where to get the module reinforcment_learning from. I've looked in PyPi, in the Jupyter notebook you published, and Google'd it, but I'm not finding where to get the reinforcement_learning python package from. Sorry to ask such a basic question, but can you tell me where I can get that package from?

Hvass-Labs commented 5 years ago

@mellertson I'm glad you found it useful. You need to download the entire GitHub repository. See the installation instructions. The file you are looking for is located here, but just download the whole repo and you'll get it:

https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/reinforcement_learning.py

mellertson commented 5 years ago

Great, I will try that. Thanks for the reply.

Best regards,

Mike

On Apr 3, 2019, at 4:07 AM, Hvass-Labs notifications@github.com wrote:

@mellertson I'm glad you found it useful. You need to download the entire GitHub repository. See the installation instructions. The file you are looking for is located here, but just download the whole repo and you'll get it:

https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/reinforcement_learning.py

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Hvass-Labs / TensorFlow-Tutorials

Reinforcement Learning Experiments #32