GFNOrg / torchgfn

GFlowNet library
https://torchgfn.readthedocs.io/en/latest/
Other
209 stars 26 forks source link

Fix off policy #174

Closed saleml closed 5 months ago

saleml commented 5 months ago

This fixes https://github.com/GFNOrg/torchgfn/issues/168. The idea is to remove the arguments we had before off_policy and sample_off_policy, and be explicit about what we're evaluating and storing when sampling. When being on_policy, we should store the logprobs. This is the default. When being off_policy, with a tempered/modified PF, we should only store estimator_outputs. When we use a replay buffer, we don't need to store anything - we should recalculate the logprobs.

Additionally, this fixes FM + ReplayBuffer, that was broken before, because states extension didn't take into account the _log_probs attribute.