Stanford-ILIAD / PantheonRL

PantheonRL is a package for training and testing multi-agent reinforcement learning environments. PantheonRL supports cross-play, fine-tuning, ad-hoc coordination, and more.
MIT License
119 stars 17 forks source link

Some problems when running tester.py #10

Open CILabTaegwan opened 1 year ago

CILabTaegwan commented 1 year ago

Hi, Could you please offer an example of tester.py? The code worked without a problem with the trainer.py, but the problem occurred with the tester. The error is as follows.

Traceback (most recent call last): File "tester.py", line 194, in run_test(ego, env, args.total_episodes, args.render) File "tester.py", line 50, in run_test action = ego.get_action(obs, False) File "C:\Users\user\Desktop\pantheon\PantheonRL\pantheonrl\common\agents.py", line 72, in getaction actions, , _ = action_from_policy(obs.obs, self.policy) AttributeError: 'numpy.ndarray' object has no attribute 'obs'

This problem does not occur when using the FIXED agent in the trainer. My torch version is 1.13.1, stable-baseline3 version is 1.6.2

Thanks

bsarkar321 commented 1 year ago

Thanks for finding this issue! The latest version of pantheonrl should fix this bug (it was caused by a new observation type that is unsupported from SB3 by default).

For reference, tester.py follows a similar syntax to trainer.py (except there are no presets). For example, if we want to run Liars dice, we can train two agents with:

python3 trainer.py LiarsDice-v0 PPO PPO --seed 10 --preset 1 -t 50000

And then we can test these two agents with:

python3 tester.py LiarsDice-v0 PPO PPO --seed 10 -t 5000 --ego-load models/LiarsDice-v0-PPO-ego-10.zip --alt-load models/LiarsDice-v0-PPO-alt-10.zip

CILabTaegwan commented 1 year ago

Thank you for the quick handling of the issue! Additionally, is it possible to set LOAD LOAD on the trainer? This process is included in LOAD PPO setting, but I thought it is necessary to control save-load process in the command. What part should be changed to implement LOAD LOAD setting?

bsarkar321 commented 1 year ago

Oh yeah, reloading both should be possible for fine tuning. We can essentially create a separate version of the gen_fixed function, but we need to wrap it in the appropriate Agent type (OnPolicyAgent or AdapAgent). I can add this feature soon-ish, but if you need this functionality in the meantime you can also write a script that loads the policy you want.

If you look at the overcookedtraining.py within the examples folder, you can replace the PPO('MlpPolicy', env, verbose=1) with PPO.load('your_file') for both the ego and partner agents.

CILabTaegwan commented 1 year ago

Thanks a lot. It worked with below code in overcookedtraining.py:

env = gym.make('OvercookedMultiEnv-v0', layout_name=layout) agentarg = {} partner = OnPolicyAgent(PPO.load("file_dir_1"), **agentarg) env.add_partner_agent(partner)

ego = PPO.load("file_dir_2") vec_env = DummyVecEnv([lambda: Monitor(env)]) ego.set_env(vec_env) ego.learn(total_timesteps=10000)

If there are any errors or points about the code, I'll appreciate of your feedback.

CILabTaegwan commented 1 year ago

Thanks for your previous apply, and I got one more question about Logger output( "rollout/ep_len_mean", "time/fps", "train/loss", ...etc.) when running the trainer.py.

While running trainer.py, the outputs of the logger are printed from "ego.learn()". I thought the reference of this module is "algos/modular/learn.py", but even if I changed the code ( such as "self.logger.record("time/fps",fps)") on learn.py, the outputs of the logger were not changed.

Where can I control the contents of the logger output? Isn't the ego.learn() module from "algos/modular/learn.py" or "algos/adap/adap_learn" ?

bsarkar321 commented 1 year ago

Great question! Based on the code you have given earlier, it seems like you are using the PPO policy from stablebaselines3, so all of the logger logic comes from there. If you would like to change the logger interface, you would probably need to define a separate PPO implementation that logs the information you want.

Alternatively, you could also use CleanRL's implementation of PPO (https://github.com/vwxyzjn/cleanrl), which is easier to understand and cleanly defines the logging behavior. However, it is not a drop-in replacement for SB3's PPO, so you may need to do some extra work to integrate pantheonrl with this different interface.