Run trained policies for active human environments on static human environments

gabriansa commented 2 years ago

Hi,

would it be possible to run trained policies for active human environments on static human environments?

In other words, imagine if I trained a policy for the environment "FeedingJacoHuman-v1" and now I want to render this policy for the environment "FeedingJaco-v1".

How can I achieve this?

I tried changing the folder name for the trained policy from FeedingJacoHuman-v1 to FeedingJaco-v1 and run the following command:

python3 -m assistive_gym.learn --env "FeedingJaco-v1" --algo ppo --render --seed 0 --load-policy-path ./trained_models/ --render-episodes 10

however, I get the following error:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/gabrigoo/assistive-gym/assistive_gym/learn.py", line 226, in <module>
    render_policy(None, args.env, args.algo, checkpoint_path if checkpoint_path is not None else args.load_policy_path, coop=coop, colab=args.colab, seed=args.seed, n_episodes=args.render_episodes)
  File "/home/gabrigoo/assistive-gym/assistive_gym/learn.py", line 104, in render_policy
    test_agent, _ = load_policy(env, algo, env_name, policy_path, coop, seed, extra_configs)
  File "/home/gabrigoo/assistive-gym/assistive_gym/learn.py", line 58, in load_policy
    agent.restore(checkpoint_path)
  File "/home/gabrigoo/env/lib/python3.8/site-packages/ray/tune/trainable.py", line 388, in restore
    self.load_checkpoint(checkpoint_path)
  File "/home/gabrigoo/env/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 818, in load_checkpoint
    self.__setstate__(extra_data)
  File "/home/gabrigoo/env/lib/python3.8/site-packages/ray/rllib/agents/trainer_template.py", line 289, in __setstate__
    Trainer.__setstate__(self, state)
  File "/home/gabrigoo/env/lib/python3.8/site-packages/ray/rllib/agents/trainer.py", line 1698, in __setstate__
    self.workers.local_worker().restore(state["worker"])
  File "/home/gabrigoo/env/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1267, in restore
    self.sync_filters(objs["filters"])
  File "/home/gabrigoo/env/lib/python3.8/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1229, in sync_filters
    assert all(k in new_filters for k in self.filters)
AssertionError

Thanks a lot.

Zackory commented 2 years ago

This is not supported in Assistive Gym currently. It would require a few changes to do. First, the active human environments use an rllib interface, rather than a strict gym interface, see: https://github.com/Healthcare-Robotics/assistive-gym/blob/main/assistive_gym/envs/feeding_envs.py#L44

Then, for active human ends, there are actually two policies trained: https://github.com/Healthcare-Robotics/assistive-gym/blob/main/assistive_gym/learn.py#L34 You will want to pull out and use only the policy for the robot, and make sure to set coop=False when loading the policy: https://github.com/Healthcare-Robotics/assistive-gym/blob/main/assistive_gym/learn.py#L32 and ensure that self.human.controllable is False: https://github.com/Healthcare-Robotics/assistive-gym/blob/main/assistive_gym/envs/feeding.py#L13

gabriansa commented 2 years ago

Thanks for the help. So I am trying to follow the steps you suggest. In order to pull out and only use the policy for the robot I did the following:

in learn.py

def setup_config(env, algo, coop=False, seed=0, extra_configs={}):
    num_processes = multiprocessing.cpu_count()
    if algo == 'ppo':
        config = ppo.DEFAULT_CONFIG.copy()
        config['train_batch_size'] = 19200
        config['num_sgd_iter'] = 50
        config['sgd_minibatch_size'] = 128
        config['lambda'] = 0.95
        config['model']['fcnet_hiddens'] = [100, 100]
    elif algo == 'sac':
        # NOTE: pip3 install tensorflow_probability
        config = sac.DEFAULT_CONFIG.copy()
        config['timesteps_per_iteration'] = 400
        config['learning_starts'] = 1000
        config['Q_model']['fcnet_hiddens'] = [100, 100]
        config['policy_model']['fcnet_hiddens'] = [100, 100]
        # config['normalize_actions'] = False
    config['num_workers'] = num_processes
    config['num_cpus_per_worker'] = 0
    config['seed'] = seed
    config['log_level'] = 'ERROR'
    # if algo == 'sac':
    #     config['num_workers'] = 1

   # HERE THE CHANGES
    obs = env.reset()
    config['observation_space'] = env.observation_space_robot
    config['action_space'] = env.action_space_robot

    # if coop:
    #     obs = env.reset()
    #     policies = {'robot': (None, env.observation_space_robot, env.action_space_robot, {}), 'human': (None, env.observation_space_human, env.action_space_human, {})}
    #     config['multiagent'] = {'policies': policies, 'policy_mapping_fn': lambda a: a}
    #     config['env_config'] = {'num_agents': 2}
    return {**config, **extra_configs}

Is that how you pull out the policy for the robot only?

Healthcare-Robotics / assistive-gym

Run trained policies for active human environments on static human environments #23