eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
591 stars 153 forks source link

some implementation issues #92

Open 6Lackiu opened 1 year ago

6Lackiu commented 1 year ago

Hi! First of all thank you for sharing such a great project! This gave me a lot of inspiration! Really appreciate! I have some questions I would like to ask you.

  1. I read your paper "Social Attention for Autonomous Decision-Making in Dense Traffic", and you mainly proposed an "attention-based neural network architecture". But in this repo, what is the purpose of implementing so many agents (MCTS, DQN...)? Different ways to implement this ‘attention architecture’?

  2. Where is the scripts/analyze.py file? Has it been superseded? image image

  3. As an RL rookie, I would like to ask if the ‘attention architecture’ you proposed can be used in other RL algorithms? As an example, suppose I have trained an RL algorithm called 'ABC' to control all the autonomous vehicles in the scene. Now I want to add your proposed 'attention architecture' to it, so that each vehicle knows which vehicles around it should pay the most attention. Finally the 'ABC' algorithm is used to train the whole model. I want to know is this possible? How should I integrate 'attention architecture' into 'ABC'?

Looking forward to your reply! Thanks!

eleurent commented 1 year ago

Hi, thanks for the feedback!

  1. You're right, maybe it's a bit confusing. My intent was to implement a lightweight RL library with many (unrelated) agents, and add my own contributions in it. But retrospectively that was probably not a great idea, as the code for each paper is not isolated... For this paper, only the DQN agent is useful, as this was the baseline that I used, just changing the network architecture.
  2. Yes, I used that initially to generate rewards plots but you should now use tensorboard, which is generally superior and broadly used.
  3. Yes, the attention-based architecture can be used with any algorithm that trains a neural network to make decisions (DQN, PPO, A3C, MuZero, etc.). The way you should integrate it to your 'ABC' algorithm depends on the library/implementation you are using, but generally there will be a file where the network/model is defined, this is what you need to replace.
6Lackiu commented 1 year ago

Hello! @eleurent Thank you so much for your reply, but I'm still a little confused.

  1. What are the input and output of EgoAttentionNetwork? Is the input the observation information of the vehicle? What conversion is needed? I can't seem to find the location of the specific code.

  2. It may be that my statement about question 3 was not clear last time, please allow me to explain again. One step in the 'ABC' algorithm is to sum the rewards of all neighboring vehicles, but I want to introduce the attention mechanism into it, which is to only calculate the rewards of nearby vehicles that NEED attention. That is to say, I need to extract the content of the ego attention part in your project, but it seems that the coupling degree of each module is very high, how do I need to extract it? Is it rl_agents/agents/common/models.py?

  3. After extracting the attention mechanism part, where should I put it in the 'ABC' part? According to your last answer, if it is placed in the network/model part, will it not destroy the original model/training process of 'ABC'?

  4. I am also curious if the above is true, what is the whole 'ABC' training process like? Is it training the original decision-making algorithm of 'ABC' while training the attention mechanism?Can I still train the 'ABC' algorithm the same way I did before, or do I need some changes?

I'm sorry if my statement is not clear, because I haven't done this kind of work before, and my thinking is a little confused.. so I come here again to ask you for advice.

Really looking forward to your reply, this is very helpful for me, thank you!

eleurent commented 1 year ago
  1. the inputs is an array containing a set of per-vehicle features. For instance, I used position x, y (absolute or relative to ego vehicle position), velocity vx vy (absolute or relative), cos/sin of heading angle, etc. The code here only assumes that the observation provided by the agent is an array of shape (n_vehicles, n_features). The output is an embedding, obtained from the attention between the ego-vehicle's token and all other vehicles'.

  2. I don't know the details of the ABC algorithm, and I don't understand why you would want to use attention to aggregate reward. In my understanding, the reward is the optimisation objective, so it has to be defined independently of the agent. Anyway, yes if you want to reuse my code you can just copy the EgoAttention class from rl_agents/agents/common/models.py.

  3. I don't know the specifics of your ABC algorithm. You can probably put it wherever you would have a network anyway.

  4. Again, I have not heard of this ABC algorithm you are referring to. Generally, you can change the architecture of a network trained by any RL algorithm without affecting how the RL algorithm itself works.

6Lackiu commented 1 year ago

@eleurent Sorry for your confusion caused by my unclear expression! But your answers have already helped me a lot! Thank you so much! Wish you all the best!

eleurent commented 1 year ago

No worries at all, glad you found this helpful!

6Lackiu commented 1 year ago

Hello @eleurent ! I would like to ask some questions, so I reopened this issue. I want to see some information about variable dimensions in the function compute_vehicles_attention in "rl_agents/agents/deep_q_network/graphics.py". So I modified the "main" function in "scripts/experiments.py" and added a breakpoint. image image However, when I run the program using Pycharm, it does not stop at the breakpoint. Do you have any good solutions for this? Thanks!

eleurent commented 1 year ago

Hi, No, I have no idea why PyCharm would not stop at the breakpoint... I tried it and it worked fine image

Are you running the program in debug mode (Shift+F9), and not run mode (Shift+F10)?

(also, I would advise to put the breakpoint directly in graphics.py if you want to analyse the variables there)

6Lackiu commented 1 year ago

Hi, No, I have no idea why PyCharm would not stop at the breakpoint... I tried it and it worked fine image

Are you running the program in debug mode (Shift+F9), and not run mode (Shift+F10)?

(also, I would advise to put the breakpoint directly in graphics.py if you want to analyse the variables there)

It works now. I must have messed something up before... Sorry for taking up your time with questions like this! Thanks!!

6Lackiu commented 1 year ago

Hi @eleurent! Sorry I have another question.

  1. Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?
  2. If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?

Look forward to your answer! Thank you!!

image

eleurent commented 1 year ago

Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?

They do! It's just that the we don't know the mapping between vehicle i in the attention matrix and the vehicles in the scene, since the vehicles ids are not provided in the observation. So in order to draw the attention edges, I'm mapping rows of the observation to vehicles in the scene based on their x,y coordinate features.

If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?

Yes we typically limit the observation to the nearest few cars, e.g. 15 vehicles. We observe empirically that the attention is useful by enabling the model to focus its computations on the 1-2 most relevant vehicles at any given time, which leads to better decisions. It is also invariant to permutations of the vehicles ordering, unlike other architectures such as MLPs.

6Lackiu commented 1 year ago

Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?

They do! It's just that the we don't know the mapping between vehicle i in the attention matrix and the vehicles in the scene, since the vehicles ids are not provided in the observation. So in order to draw the attention edges, I'm mapping rows of the observation to vehicles in the scene based on their x,y coordinate features.

If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?

Yes we typically limit the observation to the nearest few cars, e.g. 15 vehicles. We observe empirically that the attention is useful by enabling the model to focus its computations on the 1-2 most relevant vehicles at any given time, which leads to better decisions. It is also invariant to permutations of the vehicles ordering, unlike other architectures such as MLPs.

Sorry for the late reply! Thank you for answering my confusion!

6Lackiu commented 1 year ago

Hi! @eleurent I have a small question. What role does the attention of the vehicle to itself play throughout the process? Does the vehicle need to pay attention to itself? I don't quite understand. Thanks for your answer!

eleurent commented 1 year ago

I think there can be two roles:

  1. the decision may depend on the state of the vehicle, e.g. its current position, or its speed or heading. So the attention focusing on the ego-vehicle help propagate these features forward (even though they should still be there through the residual connection in the attention block)
  2. the attention layer typically converges to some filtering function which highlights only "dangerous vehicles" (which have a high risk of short term collision), and set the weights of other vehicles to 0 (they are irrelevant to the decision, their information can be dropped). But in a situation where no vehicle is dangerous, they should all be filtered out and all their weights should be 0. However, since attention weights are normalised (it is a probability distribution), they have to sum up to 1, and so the only way of keeping a weight of 0 for all other vehicles is to put all the probability mass on the ego-vehicle.

These are just hypotheses of course, the function that the attention layer ends up implementing is emerging through learning. You could very well do the experiment of removing the ego-vehicle from the available tokens, and see if this degrades performance. (you'd probably need to keep the residual connection though, we still want the ego-vehicle's feature to be available for the final decision.

6Lackiu commented 1 year ago

I think there can be two roles:

  1. the decision may depend on the state of the vehicle, e.g. its current position, or its speed or heading. So the attention focusing on the ego-vehicle help propagate these features forward (even though they should still be there through the residual connection in the attention block)
  2. the attention layer typically converges to some filtering function which highlights only "dangerous vehicles" (which have a high risk of short term collision), and set the weights of other vehicles to 0 (they are irrelevant to the decision, their information can be dropped). But in a situation where no vehicle is dangerous, they should all be filtered out and all their weights should be 0. However, since attention weights are normalised (it is a probability distribution), they have to sum up to 1, and so the only way of keeping a weight of 0 for all other vehicles is to put all the probability mass on the ego-vehicle.

These are just hypotheses of course, the function that the attention layer ends up implementing is emerging through learning. You could very well do the experiment of removing the ego-vehicle from the available tokens, and see if this degrades performance. (you'd probably need to keep the residual connection though, we still want the ego-vehicle's feature to be available for the final decision.

Got it! Thank you!!