Dictionary action type - Githubissues

Farama-Foundation / HighwayEnv

A minimalist environment for decision-making in autonomous driving

https://highway-env.farama.org/

MIT License

2.64k stars 757 forks source link

Dictionary action type #562

Open redfish9 opened 9 months ago

redfish9 commented 9 months ago

Hi @eleurent :-) I'm currently working with your awesome framework. And I'm wondering whether there exits a feasible approach to define a discrete action dictonary like below?

ACTIONS_ALL = {
      0: 'LANE_LEFT',
      1: 'IDLE',
      2: 'LANE_RIGHT',
      3: 'FASTER',
      4: 'SLOWER',
      5: {
          "id1": 'action1',
          "id2": 'action2',
      }
}

If I define an action class similar toDiscreteMetaAction but with the aforementioned action dictionary, I'm worried the whole processing logic for both action, related vehicle and even model structure need to be revised.

Is there a better choice to do this? Or would you like to offer some advice or warn me of potential risks when revising the framework if there is no better approach?

Any of your suggestions is much appreciated!!! Thanks a lot!

redfish9 commented 9 months ago

I'm expecting such datatype mainly because I want the rl agents not only to perform real actions like lane-changing, but also to "say" something like their guesses for neighouring vehicles.

This kind of capability requires recording the guessed actions, as well as the respective indexes. Only this can we know whether the trained vehicle successfully learns to guess the next actinos of its neighboring vehicles.

Thanks again for your valuable support!

eleurent commented 9 months ago

This looks like a hierachical action space, where there is a high-level action (0-5) followed by a lower-level action (id1/id2 when action=5). I don't think this is natively supported by gymnasium -- the code can probably be adapted to make it work still, but it would indeed involve "revising the whole processing logic for both action, related vehicle and even model structure" as you said.

There's no way around it: you need to tell the environment how any action will affect the state transitions, the rewards, etc.

I'm expecting such datatype mainly because I want the rl agents not only to perform real actions like lane-changing, but also to "say" something like their guesses for neighouring vehicles.

Then maybe this does not need to be modeled as an action, since its not really a long-term decision problem, rather a short term prediction one. You could just collect the observed data (empirical behaviour of neighouring vehicle), train a separate model to predict them (regardless of the agent's actions/reward optimisation) and measure the prediction accuracy/error.

redfish9 commented 9 months ago

Thanks for your prompt response!

I somehow designed a "saying" action like adding a fixed string to a list if this action is performed. Then the agents are trained by a reward function comparing the output list with ground truth. I don't know whether or not this idea works as my training using the rl_agents repository faces problems of sequence processing. I'll update once this problem is fixed.

Then maybe this does not need to be modeled as an action, since its not really a long-term decision problem, rather a short term prediction one. You could just collect the observed data (empirical behaviour of neighouring vehicle), train a separate model to predict them (regardless of the agent's actions/reward optimisation) and measure the prediction accuracy/error.

However, the "short term prediction" you mentioned confuses me. And I would be really grateful if you could offer more insights on "train a separate model to predict them".

Many thanks again.

eleurent commented 9 months ago

What I meant is that I don't think these "sayings" should be considered as actions, as they have no effect on the environment state/trajectory. You can do that of course, but I don't think this is the best modelling decision: it makes the environment unnecessarily more complex, and also these predictions will be optimised with a policy gradient loss rather than a supervised learning loss, which will be much less efficient (you improve based on trial and error instead of being given a direct feedback/improvement direction).

So I would personally not include this as part of the environment and RL objective, I would add an independent prediction head to the network and train it jointly with an additional supervised learning loss (e.g. L2) added to the usual RL loss.

So this would involve changing the training pipeline rather than the environnement, but it would be more natural and more efficient I think.

redfish9 commented 9 months ago

Many thanks for your valuable advice and I would definitely try it out as I find it hard to converge using my aforementioned approach. I've trained for around 5000 interations but the agents still collide all over. My gusess for the frequent collision is:

1) As you mentioned, the multi-demensional action space are way too complex and agents cannot learn efficiently.

2) I'm usin BicycleVehicle type and IDMVehicle classes as background vehicles. During training sessions, I found the rl agents tend to collide with BicycleVehicle often and this type of vehicle often driving revsersely.

So I'm going to try your advice first and remove the BicycleVehicle class in my environment.

Thanks again for your valuable support!!! And wish you a nice day :-)

redfish9 commented 9 months ago

There's one more thing I would like to add: the reason I design such complex data type is that I want to train agents to "recognise" the interactions among neighbors. To train this kind of behavior, I also implement a reward item that rewards each agent after a success prediction of its neighbors' actions. But after 5,000 epochs of training, the agents still cannot perform breaking down/lane changing to avoid collisions and I don't know the reason.

My action dictionary is designed as follows:

    ACTIONS_ALL = {
        'basic': {
            0: 'LANE_LEFT',
            1: 'IDLE',
            2: 'LANE_RIGHT',
            3: 'FASTER',
            4: 'SLOWER',
        },
        'intention': {
            0: 'PRED_LANE_LEFT',
            1: 'PRED_IDLE',
            2: 'PRED_LANE_RIGHT'
        }
    }

I wonder is there any possibilities that the aforementioned recoginition behvior be trained using such actions?

I'll try your suggested approach and let you know the minitue I get the result. Thanks a looooooot!!!!!