Open 6Lackiu opened 1 year ago
Hi, thanks for the feedback!
Hello! @eleurent Thank you so much for your reply, but I'm still a little confused.
What are the input and output of EgoAttentionNetwork? Is the input the observation information of the vehicle? What conversion is needed? I can't seem to find the location of the specific code.
It may be that my statement about question 3 was not clear last time, please allow me to explain again.
One step in the 'ABC' algorithm is to sum the rewards of all neighboring vehicles, but I want to introduce the attention mechanism into it, which is to only calculate the rewards of nearby vehicles that NEED attention. That is to say, I need to extract the content of the ego attention part in your project, but it seems that the coupling degree of each module is very high, how do I need to extract it? Is it rl_agents/agents/common/models.py
?
After extracting the attention mechanism part, where should I put it in the 'ABC' part? According to your last answer, if it is placed in the network/model part, will it not destroy the original model/training process of 'ABC'?
I am also curious if the above is true, what is the whole 'ABC' training process like? Is it training the original decision-making algorithm of 'ABC' while training the attention mechanism?Can I still train the 'ABC' algorithm the same way I did before, or do I need some changes?
I'm sorry if my statement is not clear, because I haven't done this kind of work before, and my thinking is a little confused.. so I come here again to ask you for advice.
Really looking forward to your reply, this is very helpful for me, thank you!
the inputs is an array containing a set of per-vehicle features. For instance, I used position x, y (absolute or relative to ego vehicle position), velocity vx vy (absolute or relative), cos/sin of heading angle, etc. The code here only assumes that the observation provided by the agent is an array of shape (n_vehicles, n_features). The output is an embedding, obtained from the attention between the ego-vehicle's token and all other vehicles'.
I don't know the details of the ABC algorithm, and I don't understand why you would want to use attention to aggregate reward. In my understanding, the reward is the optimisation objective, so it has to be defined independently of the agent. Anyway, yes if you want to reuse my code you can just copy the EgoAttention class from rl_agents/agents/common/models.py
.
I don't know the specifics of your ABC algorithm. You can probably put it wherever you would have a network anyway.
Again, I have not heard of this ABC algorithm you are referring to. Generally, you can change the architecture of a network trained by any RL algorithm without affecting how the RL algorithm itself works.
@eleurent Sorry for your confusion caused by my unclear expression! But your answers have already helped me a lot! Thank you so much! Wish you all the best!
No worries at all, glad you found this helpful!
Hello @eleurent ! I would like to ask some questions, so I reopened this issue. I want to see some information about variable dimensions in the function compute_vehicles_attention in "rl_agents/agents/deep_q_network/graphics.py". So I modified the "main" function in "scripts/experiments.py" and added a breakpoint. However, when I run the program using Pycharm, it does not stop at the breakpoint. Do you have any good solutions for this? Thanks!
Hi, No, I have no idea why PyCharm would not stop at the breakpoint... I tried it and it worked fine
Are you running the program in debug mode (Shift+F9), and not run mode (Shift+F10)?
(also, I would advise to put the breakpoint directly in graphics.py if you want to analyse the variables there)
Hi, No, I have no idea why PyCharm would not stop at the breakpoint... I tried it and it worked fine
Are you running the program in debug mode (Shift+F9), and not run mode (Shift+F10)?
(also, I would advise to put the breakpoint directly in graphics.py if you want to analyse the variables there)
It works now. I must have messed something up before... Sorry for taking up your time with questions like this! Thanks!!
Hi @eleurent! Sorry I have another question.
attention
matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?Look forward to your answer! Thank you!!
Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?
They do! It's just that the we don't know the mapping between vehicle i in the attention matrix and the vehicles in the scene, since the vehicles ids are not provided in the observation. So in order to draw the attention edges, I'm mapping rows of the observation to vehicles in the scene based on their x,y coordinate features.
If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?
Yes we typically limit the observation to the nearest few cars, e.g. 15 vehicles. We observe empirically that the attention is useful by enabling the model to focus its computations on the 1-2 most relevant vehicles at any given time, which leads to better decisions. It is also invariant to permutations of the vehicles ordering, unlike other architectures such as MLPs.
Do the columns of the attention matrix not correspond to the corresponding vehicles? Why do we need to assign attention values to vehicles based on their distances?
They do! It's just that the we don't know the mapping between vehicle i in the attention matrix and the vehicles in the scene, since the vehicles ids are not provided in the observation. So in order to draw the attention edges, I'm mapping rows of the observation to vehicles in the scene based on their x,y coordinate features.
If we do this, are we not just looking for the nearest few cars? Where does the role of attention mechanism come into play?
Yes we typically limit the observation to the nearest few cars, e.g. 15 vehicles. We observe empirically that the attention is useful by enabling the model to focus its computations on the 1-2 most relevant vehicles at any given time, which leads to better decisions. It is also invariant to permutations of the vehicles ordering, unlike other architectures such as MLPs.
Sorry for the late reply! Thank you for answering my confusion!
Hi! @eleurent I have a small question. What role does the attention of the vehicle to itself play throughout the process? Does the vehicle need to pay attention to itself? I don't quite understand. Thanks for your answer!
I think there can be two roles:
These are just hypotheses of course, the function that the attention layer ends up implementing is emerging through learning. You could very well do the experiment of removing the ego-vehicle from the available tokens, and see if this degrades performance. (you'd probably need to keep the residual connection though, we still want the ego-vehicle's feature to be available for the final decision.
I think there can be two roles:
- the decision may depend on the state of the vehicle, e.g. its current position, or its speed or heading. So the attention focusing on the ego-vehicle help propagate these features forward (even though they should still be there through the residual connection in the attention block)
- the attention layer typically converges to some filtering function which highlights only "dangerous vehicles" (which have a high risk of short term collision), and set the weights of other vehicles to 0 (they are irrelevant to the decision, their information can be dropped). But in a situation where no vehicle is dangerous, they should all be filtered out and all their weights should be 0. However, since attention weights are normalised (it is a probability distribution), they have to sum up to 1, and so the only way of keeping a weight of 0 for all other vehicles is to put all the probability mass on the ego-vehicle.
These are just hypotheses of course, the function that the attention layer ends up implementing is emerging through learning. You could very well do the experiment of removing the ego-vehicle from the available tokens, and see if this degrades performance. (you'd probably need to keep the residual connection though, we still want the ego-vehicle's feature to be available for the final decision.
Got it! Thank you!!
Hi! First of all thank you for sharing such a great project! This gave me a lot of inspiration! Really appreciate! I have some questions I would like to ask you.
I read your paper "Social Attention for Autonomous Decision-Making in Dense Traffic", and you mainly proposed an "attention-based neural network architecture". But in this repo, what is the purpose of implementing so many agents (MCTS, DQN...)? Different ways to implement this ‘attention architecture’?
Where is the scripts/analyze.py file? Has it been superseded?
As an RL rookie, I would like to ask if the ‘attention architecture’ you proposed can be used in other RL algorithms? As an example, suppose I have trained an RL algorithm called 'ABC' to control all the autonomous vehicles in the scene. Now I want to add your proposed 'attention architecture' to it, so that each vehicle knows which vehicles around it should pay the most attention. Finally the 'ABC' algorithm is used to train the whole model. I want to know is this possible? How should I integrate 'attention architecture' into 'ABC'?
Looking forward to your reply! Thanks!