HumanCompatibleAI / imitation

Clean PyTorch implementations of imitation and reward learning algorithms
https://imitation.readthedocs.io/
MIT License
1.26k stars 239 forks source link

AttributeError: 'PPO' object has no attribute 'set_logger' #550

Closed limeng-1234 closed 1 year ago

limeng-1234 commented 2 years ago

Bug description

Description of what the bug is.

Steps to reproduce

Code or a description of how to reproduce the bug.

Environment

AdamGleave commented 2 years ago

Can you fill out the template, please? Tests are passing on CI, including with PPO, so there's not something obviously broken here. We can't help you if we don't know the circumstances under which you're encountering the issue.

feliperafael commented 1 year ago

I had the same error when I tried to run the example 3_train_gail

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [7], in <cell line: 25>()
     14 learner = PPO(
     15     env=venv,
     16     policy=MlpPolicy,
   (...)
     20     n_epochs=10,
     21 )
     22 reward_net = BasicRewardNet(
     23     venv.observation_space, venv.action_space, normalize_input_layer=RunningNorm
     24 )
---> 25 gail_trainer = GAIL(
     26     demonstrations=rollouts,
     27     demo_batch_size=1024,
     28     gen_replay_buffer_capacity=2048,
     29     n_disc_updates_per_round=4,
     30     venv=venv,
     31     gen_algo=learner,
     32     reward_net=reward_net,
     33 )
     35 learner_rewards_before_training, _ = evaluate_policy(
     36     learner, venv, 100, return_episode_rewards=True
     37 )
     38 gail_trainer.train(20000)  # Note: set to 300000 for better results

File ~/anaconda3/envs/newRL/lib/python3.8/site-packages/imitation/algorithms/adversarial/gail.py:126, in GAIL.__init__(self, demonstrations, demo_batch_size, venv, gen_algo, reward_net, **kwargs)
    123 # Process it to produce output suitable for RL training
    124 # Applies a -log(sigmoid(-logits)) to the logits (see class for explanation)
    125 self._processed_reward = RewardNetFromDiscriminatorLogit(reward_net)
--> 126 super().__init__(
    127     demonstrations=demonstrations,
    128     demo_batch_size=demo_batch_size,
    129     venv=venv,
    130     gen_algo=gen_algo,
    131     reward_net=reward_net,
    132     **kwargs,
    133 )

File ~/anaconda3/envs/newRL/lib/python3.8/site-packages/imitation/algorithms/adversarial/common.py:223, in AdversarialTrainer.__init__(self, demonstrations, demo_batch_size, venv, gen_algo, reward_net, n_disc_updates_per_round, log_dir, disc_opt_cls, disc_opt_kwargs, gen_train_timesteps, gen_replay_buffer_capacity, custom_logger, init_tensorboard, init_tensorboard_graph, debug_use_ground_truth, allow_variable_horizon)
    220 self.venv_train = self.venv_wrapped
    222 self.gen_algo.set_env(self.venv_train)
--> 223 self.gen_algo.set_logger(self.logger)
    225 if gen_train_timesteps is None:
    226     gen_algo_env = self.gen_algo.get_env()

AttributeError: 'PPO' object has no attribute 'set_logger'

I'm using the stable-baselines3==0.8.0 and using imitation in the latest version of github.

I still haven't been able to make progress on solving this issue. But I'm trying to trace the source of the problem

AdamGleave commented 1 year ago

I'm using the stable-baselines3==0.8.0

The latest version of SB3 is 1.6.0, can you replicate the problem with that or another recent version?

feliperafael commented 1 year ago

I'm using the stable-baselines3==0.8.0

The latest version of SB3 is 1.6.0, can you replicate the problem with that or another recent version?

thanks a lot for the tip, it worked perfectly when I updated the stable-baselines3 to version 1.6.0

AdamGleave commented 1 year ago

Great, thanks for letting us know! Closing this issue as not replicable with updated dependencies.