Inquiry about the per action DQN in the code and Huawei's dataset

hj5717 commented 4 months ago

Hello, I have read your paper and open-source code and feel that I have benefited a lot. But as I have just started, there are still some areas that I don't understand that I would like to ask you for advice. Firstly, regarding the combinatorial optimization problem mentioned in the paper, which involves deploying agents through the madddpg algorithm on devices and having a DQN agent in the edge network. The DQN agent outputs offloading scheduling actions based on network status and device actions, but I have not observed any part about DQN in open-source code. In addition, the article mentioned using Huawei's task dataset for validation, and I did not understand how this dataset can be applied to the MADRL environment. Thank you for your reply amidst your busy schedule. Wishing you smooth research.

TesfayZ commented 4 months ago

Thank you for contacting me and having interest in our work.

Regarding your question: 1, DQN: Please discover the master agent (which you are refering as DQN) in the training and action selectoon (choose_action) functions of the CCM_MADRL.py file. See how the training function is different for the Master agent and the critic agent of classical MADDPG, which can be seen at the training function of the benchmark algorithm.

In the action selection function (choose_action) see how critic and hybrid actions are selected after the client agents produced their output. See that 3 types of actions are returned from the function. See how they are used by the mec_env.py file.

2, Dataset: This is online reinforcement learning. There is no offline data. The data is generated from a uniform distribution as seen in the mec_env.py file but uses experimental setting in the same way other papers used it by referring to Huawei Telecom. When referring to Huawei Telecom, what is used is the experimental setting as referenced from the two papers, not an actual offline dataset. Please see how we used the experimental setting as per the references.

On Mon, 13 May 2024, 13:28 hj5717, @.***> wrote:

Hello, I have read your paper and open-source code and feel that I have benefited a lot. But as I have just started, there are still some areas that I don't understand that I would like to ask you for advice. Firstly, regarding the combinatorial optimization problem mentioned in the paper, which involves deploying agents through the madddpg algorithm on devices and having a DQN agent in the edge network. The DQN agent outputs offloading scheduling actions based on network status and device actions, but I have not observed any part about DQN in open-source code. In addition, the article mentioned using Huawei's task dataset for validation, and I did not understand how this dataset can be applied to the MADRL environment. Thank you for your reply amidst your busy schedule. Wishing you smooth research.

— Reply to this email directly, view it on GitHub https://github.com/TesfayZ/CCM_MADRL_MEC/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANZZ277ULZOCHAJIWHBVB33ZCCWYVAVCNFSM6AAAAABHUDSOS2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4TENRYGY2DKMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

TesfayZ commented 4 months ago

Hi, I understand your confusion with the Per Action DQN now. First, the master agent (or DQN) is not given a different name. It is still named as a critic because it is customized from existing implementations of MADDPG. I will rename it in future pushes.

To see the difference:

See the number of arguments of the model of the critics for CCM_MADDPG in Model.py and compare how it differs from the structure of the model of the critic in classical MADDPG in Benchmark_Model.py.
See the differences in the training functions for the critic of CCM_MADDPG.py and the classical critic in Benchmark_MADDPG.py
See how the action selection runs in a for loop for every client agent (like per action DQN) in the choose_action function of the CCM_MADDPG.py from lines 426 to 446 (see the number of arguments in line 436 in particular). See also the benchmark algorithm does not compute Q-values in the action selection. It only computes in the training function.

So the critic of the CCM_MADDPG is used as both trainer and action selector. It is the master agent.

I hope this helps.

hj5717 commented 4 months ago

嗨，我现在理解您对 Per Action DQN 的困惑。首先，主代理（或 DQN）不会被赋予不同的名称。它仍然被称为批评者，因为它是从 MADDPG 的现有实现定制的。我将在以后的推送中重命名它。

要查看差异，请执行以下操作：

查看 Model.py 中CCM_MADDPG批评家模型的论据数量，并比较它与Benchmark_Model.py经典 MADDPG 中批评家模型的结构有何不同。

查看CCM_MADDPG.py批评家和古典批评家在训练功能上的差异Benchmark_MADDPG.py

在CCM_MADDPG.py的choose_action函数中，从第 426 行到第 446 行，查看每个客户端代理（如每个操作 DQN）的操作选择如何在 for 循环中运行（特别是第 436 行中的参数数）。另请参阅基准测试算法不计算操作选择中的 Q 值。它仅在训练函数中进行计算。

因此，CCM_MADDPG的批评者既被用作训练师，也被用作行动选择者。它是主代理。

我希望这会有所帮助。

Thank you very much for your patient reply. I feel greatly honored. Yes, yesterday after your reply, I also observed the differences in MADDPG networks. I simply believe that in the past, the V value output by the CRITIC network in MADDPG will also become an action in your code, which I believe is DQN. I think your job is very interesting, and I will continue to learn.

TesfayZ / CCM_MADRL_MEC

Inquiry about the per action DQN in the code and Huawei's dataset #3