ValueError when using SAC with co-optimization

Thank you for sharing this wonderful repository. When I try to run experiments with co-optimization, PPO is fine. But when I try SAC there is a strange error.

ValueError: Have multiple policies {'human': <ray.rllib.policy.tf_policy_template.SACTFPolicy object at 0x7f8ec4436470>, 'robot': <ray.rllib.policy.tf_policy_template.SACTFPolicy object at 0x7f8ebc685ef0>}, but the env <NormalizeActionWrapper<FeedingSawyerHumanEnv instance>> is not a subclass of BaseEnv, MultiAgentEnv or ExternalMultiAgentEnv?

This seems to be related to this issue of RLlib.

Healthcare-Robotics / assistive-gym

ValueError when using SAC with co-optimization #17