carpedm20 / deep-rl-tensorflow

TensorFlow implementation of Deep Reinforcement Learning papers
MIT License
1.59k stars 396 forks source link

Question about reproducing the result #26

Open chihchiehchen opened 7 years ago

chihchiehchen commented 7 years ago

Hello,

I tried to reproduce the result (with n_action_repeat 1) on the computer with GTX 1080, however the performance is not as good as shown in the figure. After 2.88 M steps the average reward is 0.0174, the average ep_reward is 3.1071, and the max ep_reward is 7.

Maybe I did something wrong in the setting or misread some information. Could you give me some suggestions? Thanks a lot!

Chih-Chieh

hiwonjoon commented 7 years ago

Not answering your questions, but, What kinds of an environment are you testing? Breakout-v0? How long does it take for about 3M steps in your setting?

chihchiehchen commented 7 years ago

Hello,

I believe there's something wrong(hardware or software),when I try to run similar program (made by the assistant Professor in the deep learning class in NCTU ), while I can only get 10 points in average and 20 in maximum, my friends told me he can easily get 30 or 40 in average and 238 in maximum.

For hardware equipment, I use GTX 1080 (8 G), INTEL core i7 7700 3.6GHz (4 core), ASUS PRIME Z270-K (ATX/DDR4*4/1A1D1H/U3.1 A+C/M.2/Com for main Board, and KINGSTON 8GB DDR4 2400 KVR24N17S8/8, RAM 288pin,DDR4-2400 5 SEAGATE 1TB ST1000DM010 BarraCuda for hardware disk; for software, I installed openai gym with version 0.7.3, tensorflow 0.12.0 (from the binary).

Two or three months ago I tried to run the program, at 3M step it takes around 5 hours (1 miniutes 07 seconds for 10 thounsand frames) in order to speed up I used some trick, I mean, if at the n-th step we get 1 point, at n-1 step we will let the reward be 0.85, in n-2 step be 0.85^2, and so on. I didn't compare this result to n-step DQN or https://arxiv.org/pdf/1611.01606.pdf , as far as I remembered it can achieved average episode reward 1 in one hour, and 3 in two hours, but the issue is the dramatically growth of q-value (around 9 )will make the learning process very insensitive, so we need a mechanism to slow down the growth of q-vale (at that time I dynamically adjust the discount factor, let me not go into the detail since I think it is very immature), and this method will not work well on A3C like algorithm, too many consecutive frames will accelerate the growth of q-value,in this case n-step A3C woks much better.

Even now I didn't figure out what is wrong with the setting, now I tried to work on A3C like algorithm, since it does not rely on hardware equipment so much. In any case thanks for your patient and thanks for the kindly reply.

Best, Chih-Chieh

On Tue, Jun 27, 2017 at 2:06 AM, Wonjoon Goo notifications@github.com wrote:

Not answering your questions, but, What kinds of an environment are you testing? Breakout-v0? How long does it take for about 3M steps in your setting?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/deep-rl-tensorflow/issues/26#issuecomment-311136660, or mute the thread https://github.com/notifications/unsubscribe-auth/AaDanW8neGJ8yfuL7fNbg92v6bCo2qc3ks5sH_MSgaJpZM4M60IS .

FushanLi commented 6 years ago

I am running the model, and get very similar performance as you. My guess is the results shown in the figure are run by DQN published on nature. And I am using the model published in nips.