GFNOrg / torchgfn

GFlowNet library
https://torchgfn.readthedocs.io/en/latest/
Other
209 stars 26 forks source link

Uniform backward policy is state independent #180

Closed tristandeleu closed 3 months ago

tristandeleu commented 3 months ago

In the Hypergrid example, when $P_{B}$ is uniform, it just takes a uniform distribution over the number of actions (-1)

https://github.com/GFNOrg/torchgfn/blob/4387e5b1b4dc9d308339c840e088e3c507c276e9/tutorials/examples/train_hypergrid.py#L99-L100

However, the number of actions that can be taken by $P_{B}$ depends on the state. For example if we are at an edge of the grid (e.g., [1, 0]), then there is only a single action that could have brought us there ([0, 0] -> [1, 0]), so this state only has a single paent. Therefore we should have only a uniform distribution over these actions (in that example, only the "going right" action) and not all actions.

tristandeleu commented 3 months ago

The masking is actually handled in DiscretePolicyEstimator, and the DiscreteUniform returns logits (pre-masked, unnormalized log-probabilities).