Closed Kautenja closed 6 years ago
The last training session confirms that this is the result of running out of memory on Ubuntu. During the last session, Ubuntu failed to kill the process and the replay queue filled the entirety of the 32GB of available system memory causing complete system unresponsiveness. This likely results from the downsampler changing from (84, 84) to (100, 100) image size. This change has been regressed to the original (84, 84). It would be convenient to calculate the memory requirements of the replay queue on initialization, then raise an error if the requirements exceed the amount of system memory based on some threshold.
With the new nes-py
back-end in use by the gym-super-mario-bros
package, this issue should be resolved. Closing for now.
Sometimes the script
dddqn_train.py
is killed by Ubuntu. Not sure if this is an issue causes by memory limitations? There should be plenty of memory for this setup, but perhaps Ubuntu kills this process for some reason. The other alternative is some sparse edge case between the Python and Lua script that is hard to reproduceoddly, the command doesn't match what was actually issued. This is a peculiar bug.