Closed seominseok00 closed 9 months ago
The reason is the following: in general, the policy frequency is not the same as the simulation frequency. Typically, the agent will take high-level decision at a low frequency (e.g. changing the velocity setpoint, the target lane being followed, etc.), and then the vehicle will take low-level actions at a higher (simulation) frequency to execute the high-level actions.
At a low policy-frequency (e.g. 1Hz), the agent gets to call
vehicle.act(action)
with some action
input.
And then during every simulation step (e.g. at 10Hz), we call for each vehicle:
vehicle.act()
vehicle.step()
where vehicle.act()
will involve adjusting the acceleration and steering commands, and vehicle.step()
integrates the dynamics.
So all you have to do is ensure that your IDMMDPVehicle handles vehicle.act()
calls (which is the same as vehicle.act(action=None)
) by storing the previous high-level action internally, such that it is reused at subsequent simulation steps. Something like:
def act(self, action: Union[dict, str] = None) -> None:
"""
Perform a high-level action to change the desired lane or speed.
- If a high-level action is provided, update the target speed and lane;
- then, perform longitudinal and lateral control.
:param action: a high-level action
"""
if action:
self.last_action = action
# Lateral: MOBIL
self.follow_road()
# Longitudinal: IDM
front_vehicle, rear_vehicle = self.road.neighbour_vehicles(self, self.lane_index)
acceleration = self.acceleration(ego_vehicle=self,
front_vehicle=front_vehicle,
rear_vehicle=rear_vehicle,
COMFORT_ACC_MAX=self.last_action["COMFORT_ACC_MAX"],
target_velocity=self.last_action["target_velocity"])
action = {}
# action['acceleration'] = self.recover_from_stop(action['acceleration'])
action['acceleration'] = acceleration
action['acceleration'] = np.clip(action['acceleration'], -self.ACC_MAX, self.ACC_MAX)
action["steering"] = self.steering_control(self.target_lane_index)
action['steering'] = np.clip(action['steering'], -self.MAX_STEERING_ANGLE, self.MAX_STEERING_ANGLE)
Vehicle.act(self, action) # Skip ControlledVehicle.act(), or the command will be overriden.
I just checked now. It seems to be working well.
Thank you.
http://highway-env.farama.org/dynamics/vehicle/behavior/#longitudinal-behavior
I created a Custom Action Space to optimize the parameters $v_0$ (desired velocity) and $a$ (maximum acceleration) of an IDM model using reinforcement learning, but it keeps returning None for actions.
Can you check if there's anything I've done wrong?
highway_env/envs/common/action.py
I referenced ContinuousAction and defined it as above, and also registered it in the action_factory.
highway_env/vehicle/controller.py
I used the methods from IDMVehicle as mentioned above and modified it to receive COMFORT_ACC_MAX and target_velocity determined through the action.
For training and testing, I used the PPO algorithm from stable baselines3. When I run it, I encounter the following error:
What could be the problem? Thanks in advance for the response.