Some questions about model

Emmanuel-Naive / MATD3

Use Multi-agent Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm to find reasonable paths for ships

43 stars 4 forks source link

Hi, welcome and good luck with learning.

The TD3 algorithm, which is improved from the DDPG, does need to adjust the hyperparameters.

2.1 Due to some features of neural networks (such as supersaturation, and overfitting), increasing the amount of training does not necessarily guarantee that the learning curve (the reward value) will always rise. By the way, multi-agent learning will cause more situations (such as stability and learning modes) and it normally exacerbates this problem. 2.2 The reinforcement learning algorithms try to find the optimal policy. Here, the aim is that the algorithm tries to find reasonable paths for ships. So, in this case, I need a model which will get the highest score. In other words, a terrible reward curve is acceptable for my project, if this curve is increasing at the beginning and the optimal path set is found. 2.3 If u want to get the perfect model for each scenario, adjusting parameters is necessary. If u just want to find reasonable paths, the model with these given parameters is acceptable.

Emmanuel-Naive / MATD3

Some questions about model #3