eleurent / rl-agents

Implementations of Reinforcement Learning and Planning algorithms
MIT License
582 stars 152 forks source link

Algorithmic issues #113

Open AHPUymhd opened 5 months ago

AHPUymhd commented 5 months ago

{ "base_config": "configs/HighwayEnv/agents/DQNAgent/ddqn.json", "model": { "type": "EgoAttentionNetwork", "embedding_layer": { "type": "MultiLayerPerceptron", "layers": [64, 64], "reshape": false, "in": 7 }, "others_embedding_layer": { "type": "MultiLayerPerceptron", "layers": [64, 64], "reshape": false, "in": 7 }, "self_attention_layer": null, "attention_layer": { "type": "EgoAttention", "feature_size": 64, "heads": 2 }, "output_layer": { "type": "MultiLayerPerceptron", "layers": [64, 64], "reshape": false } }, "gamma": 0.99, "batch_size": 64, "memory_capacity": 15000, "target_update": 512 } Hello, may I ask if this ettention code is run with the algorithm of the attention mechanism? But I see that it is an algorithm that inherits DQN, and I see that there are only DQN, DDQN and DOUBLE DQN in the algorithm library, and whether there are PPO and other algorithms, I am very much looking forward to your reply.

AHPUymhd commented 5 months ago

@eleurent

eleurent commented 5 months ago

Hi, DQN (and its variants Dueling DQN and Double Dueling DQN) is the learning algorithm (just like PPO is), while Attention/Transformer is the network architecture, which is being trained by the RL algorithm. You fill find the attention implementation in agents/common/models.py.

While I wanted to implement PPO, I never found the time.

But I wrote this script which combines StableBaselines3's PPO with my implementation of attention as a CustomPolicy.

AHPUymhd commented 5 months ago

Hi, DQN (and its variants Dueling DQN and Double Dueling DQN) is the learning algorithm (just like PPO is), while Attention/Transformer is the network architecture, which is being trained by the RL algorithm. You fill find the attention implementation in agents/common/models.py.

While I wanted to implement PPO, I never found the time.

But I wrote this script which combines StableBaselines3's PPO with my implementation of attention as a CustomPolicy.

Wow, I am really appreciate your help, I'm going to learn your code!