About the reward settings and playing game

FFrankyy / FINDER

FINDER - FInding key players in complex Networks through DEep Reinforcement learning (Nature Machine Intelligence)

MIT License

182 stars 46 forks source link

About the reward settings and playing game #16

Open xwxahu opened 4 years ago

xwxahu commented 4 years ago

Hello! I read the FINDER recently and there are two questions puzzled me.

In the article, you define the reward is decrease of ANC, however the computation of ANC needs the nodes removing sequence. How should I get the removing sequence? Using FINDER, HDA or other methods to remove nodes? 2.In supplementary, the FINDER algorithm S3 shows that SGD is performed after each storing experience. however, in the last paragraph of Ⅱ.D.2 (Train algorithm), it seems that SGD is performed after each episode. What is the episode means? Removing single node or removing nodes in a graph until terminal? Very thanks to your work! Hope your answers.

FFrankyy commented 3 years ago

Yes, using FINDER and other methods to remove sequentially.
I define the reward in SI, remove nodes until terimal.

xwxahu commented 3 years ago

Thank you very much for your reply! But I still troubled in the problem of reward. In the codes of ND problem, the reward is defined as - |GCC| / (N*N). What does it mean? For example: given a graph with 10 nodes and its |GCC| is 8: if we remove a critical node A, the |GCC| decreases to 5, than the action of removing A will get reward - 0.05 if we further remove a non-critical node B, the |GCC| is stay in 5, and the action of removing B will also get reward - 0.05. It seems that removing a non-critical node can also get higher reward. Would you like to explain it? Again thanks for your answers for the former probelms!

FFrankyy commented 3 years ago

Thank you, that's a very important question! Here our goal is to learn a node removal sequence that could minimize the ANC value, thus in your case, A and B are different when measured as critical/non-critical from the local view point, they may contribute the same to the calculation of ANC value, as a result, they are given the same reward in our setting. Keep in mind, the reward is defined according your learning objective.

xwxahu commented 3 years ago

I got it! Thanks very much for your assistance.