Learning interruption problem

mchong88 commented 4 years ago

Hi,

Thanks for this great implementation.

I run the learning model of'commons_game(train_utils)' on my laptop. However, my learning was stopped due to insufficient laptop memory. For example, it took about 3 days to train a single agent for 2800 episodes. In this case, is there a way to resume learning from where it left off? If I inevitably stop learning, what is the way to resume it?

My laptop specifications are as follows: Window 10 Home Intel core i7-10750H 16 GB RAM NVIDIA GeForce RTX 2060max-q(vram: 6GB) Python version 3.8.5 tensorflow-gpu version 2.2.0.

And, please let me know if there is a more efficient way to use'commons_game' in this notebook spec.

I'm still investigating but any help would be appreciated :)

Danfoa commented 4 years ago

Good Day @mchong88,

Well, you have to edit a little bit of the training loop in order to save checkpoints of the agent's Neural Network weights and load them appropriately when restarting.

The DDQNAgent has implementations to do this, you need only to use them appropriately.

https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/DDQN.py#L164

https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/DDQN.py#L168

https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/DDQN.py#L175

The proposed training loop has this saving functionality already implemented

https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/train_utils.py#L141-L143

You just need to check in your training records if you have a saved model and restore it, then proceed with training. Bear in mind that the only relevant thing to restore from an interrupted training process is the weights of the NN themselves.

mchong88 commented 4 years ago

Thank you for your kind answer!!

But I don't seem to understand it perfectly.

If you edit the training loop, do you mean "def train_agent"? I seem to lack understanding of how to load training data.

Sorry, but can you explain it with a simple example code?

thank you.

Danfoa commented 4 years ago

During the agents instantiation: https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/train_utils.py#L53-L63

You should verify if there are some model check-points saved from a previous experiment with the same parameters. Remember that the experiment records are saved to logdir = logs_path + "/MAP=%s-AGENTS=%d-lr=%.5f-e=%.2f-ed=%.3f-g=%.2f-b=%d" % (map_type, n_agents, lr, epsilon, epsilon_decay, gamma, batch_size). The current training loop saves the model checkpoints (weights of the architecture) every EPISODE_RECORD_FREQ into the folder models.

The ones you want to restore are stored in the folder with the largest episode achieved by the previous experiment run:

Once inside this folder, you have the saved checkpoint for the DeepQNet for each of the agents:

In order to load them, you can use

https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/DDQN.py#L168-L172 , in an already instantiated Agent, to replace its NN function approximators with the pre-trained models that were saved, or use

https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/DDQN.py#L175-L183

To instantiate a new agent with the pre-trained weights already loaded. It is up to you how you want to do this.

You have to be careful to update the training loop current episode (variable episode) with the number of the episode you restored from, so the experiment can continue without overriding older episodes. That means that if your program crashed at episode 100, then you should continue the experiment from episode 101.

I hope this was of some help.

mchong88 commented 4 years ago

Thank you for your reply.

As advised, I checked DeepQNet saved checkpoints in the "episode=2900" folder.

And I tried replacing it with a pretrained model to load it (I mostly use Jupiter laptops, but I tried it at Google Colab).

The "def load_policy" of DDQN.py has been modified as follows.

def load_policy(self, path):
    self.model = tf.keras.models.load_model('/content/drive/My Drive/jupyter/logs/MAP=small-AGENTS=1-lr=0.00050-e=0.15-ed=0.999-g=0.99-b =8/model/episode=2900/agent-0')
    opt = tf.keras.optimizers.Adam(learning_rate=self.learning_rate, clipvalue=10.0)
    self.model.compile(optimizer=opt, loss='mse')
    self.target_model = tf.keras.models.load_model('/content/drive/My Drive/jupyter/logs/MAP=small-AGENTS=1-lr=0.00050-e=0.15-ed=0.999-g=0.99-b =8/model/episode=2900/agent-0')

And the "def from_trained_policy" of DDQN.py is modified as follows.

def from_trained_policy(path_to_model, env, obs_shape, buffer_size=200, learning_rate=.0015, epsilon=.1,
                        epsilon_decay=0.995, min_epsilon=.01, gamma=.9, batch_size=8):
    model = tf.keras.models.load_model('/content/drive/My Drive/jupyter/logs/MAP=small-AGENTS=1-lr=0.00050-e=0.15-ed=0.999-g=0.99-b=8 /model/episode=2900/agent-0')
    target_model = tf.keras.models.load_model('/content/drive/My Drive/jupyter/logs/MAP=small-AGENTS=1-lr=0.00050-e=0.15-ed=0.999-g=0.99-b=8 /model/episode=2900/agent-0')
    return DDQNAgent(model, target_model, env, obs_shape, buffer_size=buffer_size, learning_rate=learning_rate,
                     epsilon=epsilon, epsilon_decay=epsilon_decay, min_epsilon=min_epsilon, gamma=gamma,
                     batch_size=batch_size)

But the method I've tried seems to be wrong.

If you run "if name == "main":" in train_utils.py, I get an error message saying "ValueError: You can only call build on a model if its call method accepts an ʻinputs` argument".

Can you tell which part is wrong?

And I didn't understand how to update the episode number in the training loop.

Can I change the episode variable to'EPISODES = 2901' in train_utils.py?

Thank you for always kindly answering fundamental questions.

mchong88 commented 4 years ago

It seems to have found a solution after trial and error. However, it is performed with a Jupyter notebook, not Google Colab.

I changed DDQN.py as below: (original) https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/DDQN.py#L168-L172

(Revision) def load_policy(self, path): load_model = "C:/Python/jupyter/logs_1/MAP=small-AGENTS=1-lr=0.00050-e=0.15-ed=0.999-g=0.99-b=8/model/episode=2900/agent-0" self.model = tf.keras.models.load_model(load_model) opt = tf.keras.optimizers.Adam(learning_rate=self.learning_rate, clipvalue=10.0) self.model.compile(optimizer=opt, loss='mse') self.target_model = tf.keras.models.load_model(load_model)

The "def from_trained_policy" part has not been changed.

I made the following changes to update the episode of the "train_utils.py" file: (original) https://github.com/Danfoa/commons_game/blob/e434023c0a9cb353f09459ea1df50c7901b2686d/train_utils.py#L65-L71

(Revision) for episode in range(3000, n_episodes + 1): start_t = time.time() print("- A:%d Episode %d" % (n_agents, episode)) episode_path = logdir + "/episodes/episode=%04d" % episode models_path = logdir + "/model/episode=%04d" % episode if episode % EPISODE_RECORD_FREQ == 0: os.makedirs(episode_path, exist_ok=True) (note: As a result of checking the TensorBoard, it has progressed to 2999 episodes just before learning stopped. Except for the two parts above, nothing has been modified.)

Another question arose here.

After making the above changes, the episode has been starting from 3000. However, there is no way to check if an existing trained model has been applied.

Is there any way to check this? And is it correct to load a discontinued learning with the solution to change as above?

Thank you!!

Danfoa commented 3 years ago

@mchon88 If you are not getting any error from the tf.keras.model.load_model(path) lines then it means that the loading worked. See the documentation of the method to see that if the weights do not exist or they are invalid an exception is thrown and your script should crash.

I recommend you to program it in a more generic way, i.e. do not hardcode the path variable as you do, but rather generate the path string on the fly considering your experiment records. i.e. check inside your experiment records for the variables that compose the file path string you want to load (last episode).

Once you load your model and restart experimentation you should see a great difference in performance since the trained models should have greater and more stable performance than untrained models. If you don't see this then it might mean that you are not saving/loading properly or that the training is not working appropriately.

If you correctly modify the episode variable number after loading previous records, the train loop should be saving the new data on the same Tensorboard file without overriding or damaging the previous data, meaning that once you open the Tensorboard visualization, you should see no discontinuity in the training progress of the agent. i.e. the experiment should look as it has never stopped.

mchong88 commented 3 years ago

Dear, Daniel Ordonez will contact you in a long time. Thank you for answering. As you told me, according to the data visualized in Tensorboard, it is continually displaying well. Fortunately, there are no other problems and it is running fine. Thank you. Best Regards, Mincheol, Hong

Danfoa / commons_game

Learning interruption problem #1