Uniform backward policy is state independent

In the Hypergrid example, when $P_{B}$ is uniform, it just takes a uniform distribution over the number of actions (-1)

https://github.com/GFNOrg/torchgfn/blob/4387e5b1b4dc9d308339c840e088e3c507c276e9/tutorials/examples/train_hypergrid.py#L99-L100

However, the number of actions that can be taken by $P_{B}$ depends on the state. For example if we are at an edge of the grid (e.g., [1, 0]), then there is only a single action that could have brought us there ([0, 0] -> [1, 0]), so this state only has a single paent. Therefore we should have only a uniform distribution over these actions (in that example, only the "going right" action) and not all actions.

GFNOrg / torchgfn

Uniform backward policy is state independent #180