Replicable-MARL / MARLlib

One repository is all that is necessary for Multi-agent Reinforcement Learning (MARL)
https://marllib.readthedocs.io
MIT License
935 stars 148 forks source link

Help with questions about custom environments #207

Open huigeopencv opened 11 months ago

huigeopencv commented 11 months ago

Hello, I am very impressed with the work you have accomplished so far on your project. But I am encountering some problems now and would like to ask you for advice. I am doing DRL in transportation, that is, combining SUMO and RAY. Since I used the RAY0.8.0 version before, which version lacked the MAPPO and MADDPG algorithms, so when I found your project, I added the programs for the two algorithm parts of the project to my folder. But when running, the following error occurred.

`observation_space` not provided in PolicySpec for default_policy and env does not have an observation space OR no spaces received from other workers env(s) OR no `observation_space` specified in config!

I haven't been able to find the problem, and I hope you can take the time to enlighten me. Below is the main content of my main function and training environment function. image

"""main function used to train vehicles to improve traffic on a highway."""
from flow.networks import BottleneckNetwork
from marl_bottleneck_env1 import MultiAgentHighwayPOEnv
from bottleneck_env_1 import MergePOEnv
import json
import ray
from ray import train, tune
from ray.rllib.agents.registry import get_agent_class, get_trainer_class
from Algorithm import mappo, maddpg
from Algorithm.mappo import MAPPOTrainer
from Algorithm.maddpg import MADDPGTrainer
from ray.rllib.agents.callbacks import DefaultCallbacks
from ray.tune import run_experiments
from ray.tune.registry import register_env
from flow.utils.registry import make_create_env

vehicles = VehicleParams()
vehicles.add("human",
             acceleration_controller=(IDMController, {}),
             lane_change_controller=(SimLaneChangeController, {}),

​             car_following_params=SumoCarFollowingParams(speed_mode="right_of_way",accel=4,decel=7.5,tau=1.5),
​             )
inflows = InFlows()
inflows.add(veh_type="human",
​            edge="1",
​            vehs_per_hour=1200,
​            depart_lane="random",
​            depart_speed="random",
​            color="white",
​            end="300"
​            )
env_params = EnvParams(horizon=HORIZON, warmup_steps=70, clip_actions=False, additional_params=ADDITIONAL_ENV_PARAMS)
net_params = NetParams(inflows=inflows, additional_params=ADDITIONAL_NET_PARAMS)
sim_params = SumoParams(sim_step=0.2, render=False, save_render=True, restart_instance=True, emission_path='data')
traffic_lights = TrafficLightParams()
flow_params = dict(
​    exp_tag="bottleneck",
​    env_name=MultiAgentHighwayPOEnv,
​    network=BottleneckNetwork,
​    simulator='traci',
​    sim=sim_params,
​    env=env_params,
​    net=net_params,
​    veh=vehicles,
​    initial=initial_config,
)
def setup_exps():
​    alg_run = "PPO"
​    agent_cls, config = get_trainer_class(alg_run, True)
​    config["num_workers"] = N_CPUS
​    config["train_batch_size"] = HORIZON * N_ROLLOUTS
​    config["gamma"] = 0.99  # discount rate
​    config["horizon"] = HORIZON
​    config["num_gpus"] = 1
​    config["framework"] = "torch"
​    config["rollout_fragment_length"] = 1500
flow_json = json.dumps(
    flow_params, cls=FlowParamsEncoder, sort_keys=True, indent=4)
config['env_config']['flow_params'] = flow_json
config['env_config']['run'] = alg_run
create_env, gym_name = make_create_env(params=flow_params, version=0)
register_env(gym_name, create_env)
return alg_run, gym_name, config
alg_run, gym_name, config = setup_exps()

ray.init(num_cpus=N_CPUS + 1)
trails = tune.run(MAPPOTrainer,
                  name=alg_run,
                  checkpoint_freq=30,
                  checkpoint_at_end=True,
                  stop={"training_iteration": 1000, },
                  config=config,
                  max_failures=999,
)
"""Environment used to train vehicles to improve traffic on a highway."""
import numpy as np
from gym.spaces.box import Box
from gym.spaces.discrete import Discrete
from flow.core.rewards import desired_velocity
from flow.envs.multiagent.base import MultiEnv
from flow.core import rewards
ADDITIONAL_ENV_PARAMS = {
    # maximum acceleration of autonomous vehicles
    'max_accel': 3,
    # maximum deceleration of autonomous vehicles
    'max_decel': 7.5,
    # desired velocity for all vehicles in the network, in m/s
    "target_velocity": 30
}

class MultiAgentHighwayPOEnv(MultiEnv):

    def __init__(self, env_params, sim_params, network, simulator='traci'):
        for p in ADDITIONAL_ENV_PARAMS.keys():
            if p not in env_params.additional_params:
                raise KeyError(
                    'Environment parameter "{}" not supplied'.format(p))

        super().__init__(env_params, sim_params, network, simulator)

    @property
    def observation_space(self):
        return Box(-float('inf'), float('inf'), shape=(5,), dtype=np.float32)

    @property
    def action_space(self):
        return Box(
            low=-np.abs(self.env_params.additional_params['max_decel']),
            high=self.env_params.additional_params['max_accel'],
            shape=(1,),  # (4,),
            dtype=np.float32)

    def _apply_rl_actions(self, rl_actions):
        if rl_actions:
            for rl_id, actions in rl_actions.items():
                accel = actions[0]
                self.k.vehicle.apply_acceleration(rl_id, accel)

    def get_state(self):
        obs = {}
        max_speed = self.k.network.max_speed()
        max_length = self.k.network.length()

        for rl_id in self.k.vehicle.get_rl_ids():
            this_speed = self.k.vehicle.get_speed(rl_id)
            lead_id = self.k.vehicle.get_leader(rl_id)
            follower = self.k.vehicle.get_follower(rl_id)

            if lead_id in ["", None]:
                lead_speed = max_speed
                lead_head = max_length
            else:
                lead_speed = self.k.vehicle.get_speed(lead_id)
                lead_head = self.k.vehicle.get_headway(lead_id)

            if follower in ["", None]:
                follow_speed = 0
                follow_head = max_length
            else:
                follow_speed = self.k.vehicle.get_speed(follower)
                follow_head = self.k.vehicle.get_headway(follower)
            observation = np.array([
                this_speed / max_speed,
                (lead_speed - this_speed) / max_speed,
                lead_head / max_length,
                (this_speed - follow_speed) / max_speed,
                follow_head / max_length
            ])
            obs.update({rl_id: observation})

        return obs

    def compute_reward(self, rl_actions, **kwargs):
        if rl_actions is None:
            return {}
        rewards = {}
        for rl_id in self.k.vehicle.get_rl_ids():
            if kwargs['fail']:
                # reward is 0 if a collision occurred
                reward = 0
            else:
                max_speed = self.k.network.max_speed()
                eta1, eta2, eta3 = 1, 1, 1
                cost1, cost2, cost3 = 1, 1, 1
                reward = eta1 * cost1 + eta2 * cost2 + eta3 * cost3
            rewards[rl_id] = reward
        return rewards
Theohhhu commented 10 months ago

It appears that the issue may be related to differences between versions of Ray/RLlib, which are not backward and forward compatible. I recommend transitioning to at least version 1.8, which is the foundation of MARLlib. Additionally, transferring an algorithm from our framework to yours might be a challenging task.

huigeopencv commented 10 months ago

The version I am using is ray1.8.0. I guess it is because of my training environment problem and it should be incompatible. But I really don’t have a clue to DEBUG

Aequatio-Space commented 10 months ago

Since you want to use the algorithm of MARLlib, I guess you may need to override the abstract class MultiAgentEnv provided by ray, or write a wrapper for the algorithm to work. Also, try to adhere implementation example from add_new_env.py, which will ensure interfaces are correctly matched.

https://github.com/Replicable-MARL/MARLlib/blob/368c6173577d0f9c0ad70fb5b4b6afa12c864c15/examples/add_new_env.py#L66-L83