d-tiapkin / gflownet-rl

Repository for "Generative Flow Networks as Entropy-Regularized RL" (AISTATS-2024, Oral)
https://arxiv.org/abs/2310.12934
MIT License
24 stars 0 forks source link

Implementation correctness #1

Open MisakiTaro0414 opened 4 months ago

MisakiTaro0414 commented 4 months ago

Dear Authors,

First of all, I am very thankful for your repository. I got confused about the correctness of implementation in one part. For soft_dqn.py, the variable valid_v_target_next is getting multiplied with policy_sn in torch.sum module. According to my derivation, there should not be such kind of multiplication. Could you please point out how this policy_sn comes into the equation. Thanks.

d-tiapkin commented 4 months ago

Thank you for your response! In all our experiments, we didn't use the option is_double=True, so it was not reported and described in the paper. In other words, only the first option was used in the following if clause. https://github.com/d-tiapkin/gflownet-rl/blob/434732044ffbadc7d4b585a2e04a1a047297d42c/hypergrid/algorithms/soft_dqn.py#L121-L135

Regarding the option is_double=True, this option regulates a usage of the Double DQN heuristic (see e.g. https://arxiv.org/abs/1509.06461) adapted to the entropy-regularized setting. As I have already mentioned, we did not use it in our final experiments, so it was not described in our paper. In essence, instead of computing log-sum-exp, it utilizes the current policy (it is policy_sn) and the value associated with this policy and target Q-value; the product with policy_sn and subsequent torch.sum is used to compute the expectation.

I hope this explanation will help you! If you have any other questions, please don't hesitate.