Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving
https://highway-env.farama.org/
MIT License
2.48k stars 725 forks source link

problems about action range #604

Open brodermind opened 1 week ago

brodermind commented 1 week ago

Hi, everyone! I set env config as follows and trained a model:

env = gym.make("highway-v0", render_mode='rgb_array')
config = {
    ...
    "action":{
        "type":"ContinuousAction",
        "acceleration_range": [-10.0,8.0],
        "steering_range":[-np.pi/4, np.pi/4],

    },
    "controlld_vehicles": 1,
    "duration": 150,
    "vehicles_count": 50,
    "vehicles_density": 1,
    "absolute": False,
    "order": "sorted",
    "simulation_frequency": 15,  # [Hz]
    "policy_frequency": 15, 
    "normalize": False,
    "normalize_reward": False,
    "clip": False,
    "offroad_terminal": True
}
env.configure(config)
env.reset()

However, when i reload the model and print the action and steering value, they are all not in the correct value range. I confused.... i write print("acc: {}, steering: {}".format(action[0], action[1])) in def _reward() in highway_env.py and write print in following function

def act(self, action: np.ndarray) -> None:
       ...
        if self.longitudinal and self.lateral:
            print("original acc: {}, steering: {}".format(action[0], action[1]))
            self.controlled_vehicle.act({
                "acceleration": utils.lmap(action[0], [-1, 1], self.acceleration_range),
                "steering": utils.lmap(action[1], [-1, 1], self.steering_range),
            })
            # action[0] = utils.lmap(action[0], [-1, 1], self.acceleration_range)
            # action[1] = utils.lmap(action[1], [-1, 1], self.steering_range)
        elif self.longitudinal:
            self.controlled_vehicle.act({
                "acceleration": utils.lmap(action[0], [-1, 1], self.acceleration_range),
                "steering": 0,
            })
        elif self.lateral:
            self.controlled_vehicle.act({
                "acceleration": 0,
                "steering": utils.lmap(action[0], [-1, 1], self.steering_range)
            })
        print("actual acc: {}, steering: {}".format(action[0], action[1]))
        self.last_action = action

I get the following result: original acc: 0.1795186996459961, steering: -0.02994769811630249 actual acc: 0.1795186996459961, steering: -0.02994769811630249 acc: 0.1795186996459961, steering: -0.02994769811630249 original acc: 0.20840740203857422, steering: -0.02228790521621704 actual acc: 0.20840740203857422, steering: -0.02228790521621704 acc: 0.20840740203857422, steering: -0.02228790521621704 original acc: 0.2234337329864502, steering: -0.014497756958007812 actual acc: 0.2234337329864502, steering: -0.014497756958007812 acc: 0.2234337329864502, steering: -0.014497756958007812 original acc: 0.2373422384262085, steering: -0.007743716239929199 actual acc: 0.2373422384262085, steering: -0.007743716239929199 acc: 0.2373422384262085, steering: -0.007743716239929199 original acc: 0.25095176696777344, steering: -0.0019524693489074707 actual acc: 0.25095176696777344, steering: -0.0019524693489074707 acc: 0.25095176696777344, steering: -0.0019524693489074707 original acc: 0.2658735513687134, steering: 0.0023392438888549805 actual acc: 0.2658735513687134, steering: 0.0023392438888549805 acc: 0.2658735513687134, steering: 0.0023392438888549805 original acc: 0.2798728942871094, steering: 0.004693746566772461 actual acc: 0.2798728942871094, steering: 0.004693746566772461 acc: 0.2798728942871094, steering: 0.004693746566772461 original acc: 0.290974497795105, steering: 0.005937099456787109 actual acc: 0.290974497795105, steering: 0.005937099456787109 acc: 0.290974497795105, steering: 0.005937099456787109 original acc: 0.3005625009536743, steering: 0.0067664384841918945 actual acc: 0.3005625009536743, steering: 0.0067664384841918945 acc: 0.3005625009536743, steering: 0.0067664384841918945

I AM SO CONFUSED .......

brodermind commented 1 week ago

@eleurent

eleurent commented 1 week ago

You are printing the same input action twice, the one which is unscaled, in [-1, 1]. The scaled action is fed to the vehicle directly. After it has been executed, you can access it with print(self.controlled_vehicle.action)

brodermind commented 1 week ago

Does the action input in def _reward(self, action: Action) -> float:also unscaled ? I wanna compute rewards according to the actual action, i also add print(self.controlled_vehicle.action) in the def _reward(self, action: Action) -> float: function, but get AttributeError: 'HighwayEnv' object has no attribute 'controlled_vehicle', how can i get the actual action in def _reward(self, action: Action) -> float: ?

brodermind commented 5 days ago

Besides, I write print("ego acc: {}, speed: {}, acc: {}".format(self.vehicle.action, self.vehicle.speed, action[0])) in def _reward(self, action: Action) -> float:, get ego acc: {'acceleration': -9.143099665641785, 'steering': -0.0170410996816317}, speed: 24.390460022290547, acc: -0.9047888517379761 ego acc: {'acceleration': -8.94046038389206, 'steering': -0.006646797551511874}, speed: 23.794429330031075, acc: -0.8822733759880066 ego acc: {'acceleration': -8.713669419288635, 'steering': 0.00408259474217243}, speed: 23.213518035411834, acc: -0.8570743799209595 ego acc: {'acceleration': -8.462130784988403, 'steering': 0.011589494497536434}, speed: 22.649375983079274, acc: -0.8291256427764893 ego acc: {'acceleration': -8.193395435810089, 'steering': 0.01602412584630364}, speed: 22.103149620691934, acc: -0.7992661595344543 ego acc: {'acceleration': -7.89635956287384, 'steering': 0.018041595207714756}, speed: 21.576725649833676, acc: -0.7662621736526489 ego acc: {'acceleration': -7.466260373592377, 'steering': 0.01743891977243528}, speed: 21.07897495826085, acc: -0.7184733748435974 ego acc: {'acceleration': -7.3168264627456665, 'steering': 0.017414108681810925}, speed: 20.59118652741114, acc: -0.7018696069717407 ego acc: {'acceleration': -7.18576192855835, 'steering': 0.020717354298106838}, speed: 20.112135732173915, acc: -0.6873068809509277

So the 'action' is unscaled, and 'self.vehicle.action' is scaled action. However, the self.vehicle.speed seems not scaled because it seems calculate by self.vehicle.speed + action[0], not self.vehicle.speed + self.vehicle.action[acceleration]? @eleurent