IBM / rl-testbed-for-energyplus

Reinforcement Learning Testbed for Power Consumption Optimization using EnergyPlus
MIT License
186 stars 77 forks source link

Cannot save `trpo_mpi` model #106

Closed WynnCJF closed 2 years ago

WynnCJF commented 2 years ago

As #88 indicates, I tried to save the model in this way but failed:

policy = trpo_mpi.learn(env=env,
                network=mlp(num_hidden=32, num_layers=2),
                total_timesteps=num_timesteps,
                #timesteps_per_batch=1*1024, max_kl=0.01, cg_iters=10, cg_damping=0.1,
                timesteps_per_batch=16*1024, max_kl=0.01, cg_iters=10, cg_damping=0.1,
                gamma=0.99, lam=0.98, vf_iters=5, vf_stepsize=1e-3)
policy.save(save_path="/root/rl-testbed-for-energyplus/model.pth")

The error message is

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/rl-testbed-for-energyplus/baselines_energyplus/trpo_mpi/run_energyplus.py", line 75, in <module>
    main()
  File "/root/rl-testbed-for-energyplus/baselines_energyplus/trpo_mpi/run_energyplus.py", line 72, in main
    train(args.env, num_timesteps=args.num_timesteps, seed=args.seed)
  File "/root/rl-testbed-for-energyplus/baselines_energyplus/trpo_mpi/run_energyplus.py", line 67, in train
    policy.save(save_path="/root/rl-testbed-for-energyplus/model.pth")
AttributeError: 'PolicyWithValue' object has no attribute 'save'
EnergyPlusEnv: Severe error(s) occurred. Error count: -1
EnergyPlusEnv: Check contents of /root/eplog/openai-2022-07-12-01-35-18-145462/output/episode-00000001-55096/eplusout.err

Is the saving method changed? Thanks in advance!

antoine-galataud commented 2 years ago

Hi @WynnCJF. Looking at baselines code it seems indeed there was a change with the integration of tensorflow2: save and load methods were removed from PolicyWithValue class (see https://github.com/openai/baselines/blob/tf2/baselines/common/policies.py).

A possible fix (not tested):

policy = trpo_mpi.learn(env=env,
    network=mlp(num_hidden=32, num_layers=2),
    total_timesteps=num_timesteps,
    #timesteps_per_batch=1*1024, max_kl=0.01, cg_iters=10, cg_damping=0.1,
    timesteps_per_batch=16*1024, max_kl=0.01, cg_iters=10, cg_damping=0.1,
    gamma=0.99, lam=0.98, vf_iters=5, vf_stepsize=1e-3
)

import tensorflow as tf
save_path="/root/rl-testbed-for-energyplus/model.path"
ckpt = tf.train.Checkpoint(model=policy)
manager = tf.train.CheckpointManager(ckpt, save_path, max_to_keep=None)
manager.save()

Hope it helps!

WynnCJF commented 2 years ago

Thanks Antoine! This solves our problem.