Open lml519 opened 4 years ago
I believe our DQN is in the paradigm of centralized training and decentralized execution. During training, we collect all trajectories and train a single shared model, so the training is centralized. During inference, we feed in different observations and agent embeddings, so the execution is decentralized.
Does the DQN fall in the paradigm of decentralized training and decentralized execution. I think it is the alogorithm to combine the Parallel computing with the DTDE. I'm not sure if my idea is right