Emmanuel-Naive / MATD3

Use Multi-agent Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm to find reasonable paths for ships
43 stars 4 forks source link

Some questions about model #3

Open IIIgnac opened 1 year ago

IIIgnac commented 1 year ago

Hello, as a newcomer to reinforcement learning, I have some questions I would like to ask.

  1. Are the hyperparameters in the model suitable for all scenarios?
  2. When I was training the model with your parameters, I found that the final result was very poor, which appeared in almost every scenario. I recorded the reward value in each episode, and I found that the model does not converge to a high reward, on the contrary,it converges to a negative reward every time. So I have a doubt, are the hyperparameters in the code you provided final hyperparameters? If I need to train the model myself, in what aspects do I need to adjust the parameters? Thank you very much!
Emmanuel-Naive commented 1 year ago

Hi, welcome and good luck with learning.

  1. The TD3 algorithm, which is improved from the DDPG, does need to adjust the hyperparameters.

2.1 Due to some features of neural networks (such as supersaturation, and overfitting), increasing the amount of training does not necessarily guarantee that the learning curve (the reward value) will always rise. By the way, multi-agent learning will cause more situations (such as stability and learning modes) and it normally exacerbates this problem. 2.2 The reinforcement learning algorithms try to find the optimal policy. Here, the aim is that the algorithm tries to find reasonable paths for ships. So, in this case, I need a model which will get the highest score. In other words, a terrible reward curve is acceptable for my project, if this curve is increasing at the beginning and the optimal path set is found. 2.3 If u want to get the perfect model for each scenario, adjusting parameters is necessary. If u just want to find reasonable paths, the model with these given parameters is acceptable.