devendrachaplot / DeepRL-Grounding

Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)
https://sites.google.com/view/gated-attention/home
MIT License
237 stars 37 forks source link

Issue in reproducing results #7

Closed soumikdasgupta closed 6 years ago

soumikdasgupta commented 6 years ago

Hi! Thank you for providing the code. I am facing issues reproducing the results from the paper. Starting training from the pre-trained model provided, avg accuracy scores are low. I am pasting the log:

DeepRL-Grounding-master$ python a3c_main.py --num-processes 16 --evaluate 0 --load saved/pretrained_model --difficulty easy 2 Loading model ... saved/pretrained_model 9 Loading model ... saved/pretrained_model Loading model ... saved/pretrained_model 14 Loading model ... saved/pretrained_model 5 Loading model ... saved/pretrained_model 12 Loading model ... saved/pretrained_model 11 Loading model ... saved/pretrained_model 6 Loading model ... saved/pretrained_model 8 Loading model ... saved/pretrained_model 0 Loading model ... saved/pretrained_model 10 Loading model ... saved/pretrained_model 15 Loading model ... saved/pretrained_model 1 Loading model ... saved/pretrained_model 13 Loading model ... saved/pretrained_model 4 Loading model ... saved/pretrained_model 3 Loading model ... saved/pretrained_model 7 Loading model ... saved/pretrained_model Time 00h 39m 33s, Avg Reward 0.392, Avg Accuracy 0.42, Avg Ep length 19.9, Best Reward 0.0 Time 01h 21m 19s, Avg Reward 0.26, Avg Accuracy 0.3, Avg Ep length 21.66, Best Reward 0.392 Time 02h 04m 29s, Avg Reward 0.324, Avg Accuracy 0.34, Avg Ep length 21.9, Best Reward 0.392 Time 02h 48m 33s, Avg Reward 0.308, Avg Accuracy 0.32, Avg Ep length 22.92, Best Reward 0.392 Time 03h 29m 33s, Avg Reward 0.376, Avg Accuracy 0.38, Avg Ep length 21.22, Best Reward 0.392 Time 04h 14m 27s, Avg Reward 0.288, Avg Accuracy 0.3, Avg Ep length 23.64, Best Reward 0.392 Time 04h 58m 49s, Avg Reward 0.228, Avg Accuracy 0.24, Avg Ep length 23.74, Best Reward 0.392 Time 05h 41m 42s, Avg Reward 0.276, Avg Accuracy 0.28, Avg Ep length 23.64, Best Reward 0.392 Time 06h 25m 22s, Avg Reward 0.336, Avg Accuracy 0.34, Avg Ep length 23.26, Best Reward 0.392 Time 07h 06m 41s, Avg Reward 0.376, Avg Accuracy 0.38, Avg Ep length 21.96, Best Reward 0.392 Time 07h 47m 06s, Avg Reward 0.416, Avg Accuracy 0.42, Avg Ep length 21.44, Best Reward 0.392 Time 08h 27m 26s, Avg Reward 0.38, Avg Accuracy 0.38, Avg Ep length 22.06, Best Reward 0.416


When training the model from scratch in easy mode with -0.005 living cost and 0 living cost, the log was:

python a3c_main.py --num-processes 16 --evaluate 0 --difficulty easy Time 00h 11m 08s, Avg Reward 0.004, Avg Accuracy 0.16, Avg Ep length 7.98, Best Reward 0.0 Time 00h 19m 36s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.56, Best Reward 0.004 Time 00h 27m 11s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.68, Best Reward 0.004 Time 00h 33m 53s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.004 Time 00h 40m 24s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.04 Time 00h 47m 14s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.04 Time 00h 53m 58s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.04 Time 01h 00m 38s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 4.0, Best Reward 0.04 Time 01h 07m 26s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.04 Time 01h 14m 05s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.04 Time 01h 20m 46s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.0, Best Reward 0.04 Time 01h 28m 01s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.5, Best Reward 0.04 Time 01h 35m 43s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.94, Best Reward 0.04 Time 01h 43m 23s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.112 Time 01h 51m 28s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.0, Best Reward 0.112 Time 01h 58m 52s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.74, Best Reward 0.112 Time 02h 06m 37s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.98, Best Reward 0.112 Time 02h 14m 08s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.7, Best Reward 0.112 Time 02h 20m 59s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.04, Best Reward 0.112 Time 02h 28m 27s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.6, Best Reward 0.112 Time 02h 35m 23s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.26, Best Reward 0.112 Time 02h 42m 11s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.02, Best Reward 0.112 Time 02h 49m 04s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.16, Best Reward 0.112 Time 02h 57m 03s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.88, Best Reward 0.112
Time 03h 04m 38s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.7, Best Reward 0.112 Time 03h 11m 32s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.18, Best Reward 0.112 Time 03h 18m 16s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.0, Best Reward 0.112 Time 03h 25m 05s, Avg Reward 0.16, Avg Accuracy 0.3, Avg Ep length 4.0, Best Reward 0.112 Time 03h 31m 42s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.16 Time 03h 38m 36s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.16 Time 03h 45m 30s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.0, Best Reward 0.16 Time 03h 52m 15s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.12, Best Reward 0.16 Time 03h 59m 01s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.16 Time 04h 05m 44s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.24, Best Reward 0.16 Time 04h 13m 43s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.0, Best Reward 0.16 Time 04h 21m 32s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.16 Time 04h 29m 13s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.16 Time 04h 37m 07s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 5.0, Best Reward 0.16 Time 04h 44m 51s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.86, Best Reward 0.16 Time 04h 52m 06s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.52, Best Reward 0.16 Time 04h 58m 46s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.06, Best Reward 0.16 Time 05h 05m 55s, Avg Reward -0.104, Avg Accuracy 0.08, Avg Ep length 4.22, Best Reward 0.16 Time 05h 16m 13s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.16 Time 05h 27m 28s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.16 Time 05h 38m 21s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.16 Time 05h 49m 23s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.0, Best Reward 0.16 Time 06h 00m 09s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.0, Best Reward 0.16 Time 06h 10m 35s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.16 Time 06h 21m 45s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.12, Best Reward 0.16 Time 06h 36m 23s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 5.88, Best Reward 0.16 Time 06h 51m 05s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 6.0, Best Reward 0.16 Time 07h 04m 16s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.12, Best Reward 0.16 Time 07h 18m 24s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 6.78, Best Reward 0.16 Time 07h 28m 26s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 6.74, Best Reward 0.16 Time 07h 36m 14s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.84, Best Reward 0.16 Time 07h 43m 10s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.34, Best Reward 0.16 Time 07h 51m 03s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.82, Best Reward 0.16 Time 07h 58m 41s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.64, Best Reward 0.16
Time 08h 05m 15s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.16 Time 08h 12m 02s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 4.08, Best Reward 0.16 Time 08h 20m 20s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.36, Best Reward 0.184 Time 08h 28m 37s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.42, Best Reward 0.184 Time 08h 36m 42s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.26, Best Reward 0.184 Time 08h 45m 00s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.32, Best Reward 0.184 Time 08h 54m 55s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 6.58, Best Reward 0.184 Time 09h 05m 13s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 7.02, Best Reward 0.184 Time 09h 12m 35s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.66, Best Reward 0.184 Time 09h 20m 36s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 5.16, Best Reward 0.184 Time 09h 28m 52s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 5.42, Best Reward 0.184 Time 09h 37m 38s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.86, Best Reward 0.184 Time 09h 45m 54s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.42, Best Reward 0.184 Time 09h 52m 41s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.14, Best Reward 0.184 Time 09h 59m 17s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 4.0, Best Reward 0.184 Time 10h 05m 48s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.04, Best Reward 0.184 Time 10h 12m 19s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.184 Time 10h 18m 39s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.0, Best Reward 0.184 Time 10h 25m 00s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.0, Best Reward 0.184 Time 10h 31m 55s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 4.06, Best Reward 0.184 Time 10h 39m 39s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 10h 47m 22s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.9, Best Reward 0.184 Time 10h 55m 23s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 11h 03m 23s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.2, Best Reward 0.184 Time 11h 10m 32s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.4, Best Reward 0.184 Time 11h 17m 43s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.24, Best Reward 0.184 Time 11h 25m 51s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.26, Best Reward 0.184 Time 11h 33m 49s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.22, Best Reward 0.184 Time 11h 41m 35s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 11h 49m 08s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.184 Time 11h 56m 36s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.78, Best Reward 0.184 Time 12h 03m 25s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.04, Best Reward 0.184 Time 12h 13m 12s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.184 Time 12h 23m 39s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 4.0, Best Reward 0.184 Time 12h 30m 18s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.0, Best Reward 0.184
Time 12h 37m 06s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.0, Best Reward 0.184 Time 12h 43m 50s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.184 Time 12h 50m 31s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.0, Best Reward 0.184 Time 12h 57m 05s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 4.0, Best Reward 0.184 Time 13h 06m 19s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 4.12, Best Reward 0.184 Time 13h 18m 38s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 6.12, Best Reward 0.184 Time 13h 31m 20s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 6.2, Best Reward 0.184 Time 13h 45m 22s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.84, Best Reward 0.184 Time 13h 56m 15s, Avg Reward -0.104, Avg Accuracy 0.08, Avg Ep length 4.36, Best Reward 0.184 Time 14h 07m 04s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 4.22, Best Reward 0.184 Time 14h 18m 43s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.82, Best Reward 0.184 Time 14h 30m 49s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 4.94, Best Reward 0.184 Time 14h 43m 07s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 4.86, Best Reward 0.184 Time 14h 55m 36s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.06, Best Reward 0.184 Training thread: 15 Num iters: 1K Avg policy loss: 0.117592454028 Avg value loss: 0.832445306242 Time 15h 07m 37s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.0, Best Reward 0.184 Training thread: 3 Num iters: 1K Avg policy loss: -0.130185781124 Avg value loss: 0.736817106047 Time 15h 20m 15s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.0, Best Reward 0.184 Training thread: 8 Num iters: 1K Avg policy loss: -0.115444350065 Avg value loss: 0.736987084803 Training thread: 12 Num iters: 1K Avg policy loss: -0.139137712868 Avg value loss: 0.745042469732 Training thread: 13 Num iters: 1K Avg policy loss: -0.087330975621 Avg value loss: 0.74735086819 Time 15h 32m 27s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 5.0, Best Reward 0.184 Training thread: 5 Num iters: 1K Avg policy loss: -0.109482607283 Avg value loss: 0.762932332613 Training thread: 7 Num iters: 1K Avg policy loss: -0.0333308469482 Avg value loss: 0.775354296297 Training thread: 11 Num iters: 1K Avg policy loss: -0.185568212742 Avg value loss: 0.713746232403 Training thread: 10 Num iters: 1K Avg policy loss: -0.0643703758532 Avg value loss: 0.743953702528 Training thread: 6 Num iters: 1K Avg policy loss: -0.260978381567 Avg value loss: 0.682364837718 Time 15h 45m 03s, Avg Reward 0.088, Avg Accuracy 0.24, Avg Ep length 5.0, Best Reward 0.184 Training thread: 0 Num iters: 1K Avg policy loss: 0.00585684951555 Avg value loss: 0.793881461014 Training thread: 14 Num iters: 1K Avg policy loss: -0.193044243555 Avg value loss: 0.703806976835 Training thread: 1 Num iters: 1K Avg policy loss: -0.184739318009 Avg value loss: 0.707147910109 Time 15h 56m 53s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.84, Best Reward 0.184 Training thread: 9 Num iters: 1K Avg policy loss: -0.0644458006085 Avg value loss: 0.785048694798 Training thread: 4 Num iters: 1K Avg policy loss: 0.00307522571788 Avg value loss: 0.817228694934 Training thread: 2 Num iters: 1K Avg policy loss: -0.0580463716025 Avg value loss: 0.795366896593 Time 16h 09m 23s, Avg Reward -0.08, Avg Accuracy 0.1, Avg Ep length 4.94, Best Reward 0.184 Time 16h 22m 41s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.64, Best Reward 0.184 Time 16h 38m 12s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 6.64, Best Reward 0.184 Time 16h 54m 38s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 7.22, Best Reward 0.184 Time 17h 10m 50s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 7.0, Best Reward 0.184 Time 17h 24m 54s, Avg Reward 0.136, Avg Accuracy 0.28, Avg Ep length 5.96, Best Reward 0.184 Time 17h 38m 40s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.6, Best Reward 0.184 Time 17h 50m 38s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184 Time 18h 02m 37s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.0, Best Reward 0.184 Time 18h 13m 26s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.4, Best Reward 0.184 Time 18h 23m 55s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 4.08, Best Reward 0.184 Time 18h 34m 18s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.0, Best Reward 0.184 Time 18h 45m 35s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.58, Best Reward 0.184 Time 18h 57m 34s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 4.86, Best Reward 0.184 Time 19h 09m 44s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 19h 22m 10s, Avg Reward 0.112, Avg Accuracy 0.26, Avg Ep length 5.0, Best Reward 0.184 Time 19h 34m 37s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.184 Time 19h 47m 00s, Avg Reward -0.056, Avg Accuracy 0.12, Avg Ep length 5.0, Best Reward 0.184 Time 19h 59m 30s, Avg Reward 0.04, Avg Accuracy 0.2, Avg Ep length 5.0, Best Reward 0.184 Time 20h 11m 21s, Avg Reward 0.16, Avg Accuracy 0.3, Avg Ep length 5.0, Best Reward 0.184 Time 20h 23m 52s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184 Time 20h 36m 31s, Avg Reward 0.064, Avg Accuracy 0.22, Avg Ep length 5.0, Best Reward 0.184 Time 20h 48m 41s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 21h 01m 03s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 21h 13m 27s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184 Time 21h 25m 45s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 21h 37m 54s, Avg Reward -0.008, Avg Accuracy 0.16, Avg Ep length 5.0, Best Reward 0.184 Time 21h 50m 15s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.04, Best Reward 0.184 Time 22h 02m 34s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184
Time 22h 14m 52s, Avg Reward -0.032, Avg Accuracy 0.14, Avg Ep length 5.0, Best Reward 0.184 Time 22h 27m 06s, Avg Reward 0.184, Avg Accuracy 0.32, Avg Ep length 5.0, Best Reward 0.184 Time 22h 39m 22s, Avg Reward 0.016, Avg Accuracy 0.18, Avg Ep length 5.0, Best Reward 0.184


System config:

lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 40 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz

It'll be really helpful if you could point me to what could be going wrong in my training procedure. Regards

msdejong commented 6 years ago

As someone else attempting to reproduce the model: I don't know what the issue is with the pretrained model, but for the second issue: your training is far too slow. With your CPU it should achieve 3k+ iterations per thread per hour, and their performance graph only seems to increase after 10 hours at that speed (so about 30k iterations per thread, instead of 1k).

I would check your CPU utilization. If it's at 100%, it's possible there is some kind of multithreading issue going on. We had the same experience, and solved it by moving the os call to set the number of threads to just after the os import, before importing the model.

soumikdasgupta commented 6 years ago

Hi Michiel29,

Thanks for replying. Yes the CPU utilisation is showing 100% for all the threads.

I'm not very clear on the change you suggested. Would be great if you could share a code snippet.

Kind Regards.

msdejong commented 6 years ago

In a3c main, try replacing

import os import numpy as np import torch import torch.multiprocessing as mp

import env as grounding_env from models import A3C_LSTM_GA from a3c_train import train from a3c_test import test

import logging

os.environ["OMP_NUM_THREADS"] = "1"

by

import os os.environ["OMP_NUM_THREADS"] = "1" import numpy as np import torch import torch.multiprocessing as mp

import env as grounding_env from models import A3C_LSTM_GA from a3c_train import train from a3c_test import test

import logging

soumikdasgupta commented 6 years ago

Thanks a lot Michiel.. The training is going much faster now. Hopefully the numbers will be much better this time.. :)

devendrachaplot commented 6 years ago

I believe the issue with the pre-trained model is due to the python version. It seems to work fine with python2 and doesn't seem to work with python3.

The multiprocessing issue occurs on some systems and solution suggested by Michiel seems to work. I modified the code to fix this issue.

vkurenkov commented 6 years ago

@devendrachaplot, could not reproduce the results using pretrained_model as well.

Tried out these combinations:

Average accuracy below 15% for the hard environment.