GFNOrg / torchgfn

GFlowNet library
https://torchgfn.readthedocs.io/en/latest/
Other
209 stars 26 forks source link

Is it normal that running `train_line.py` renders samples from one mode only? #173

Closed saleml closed 5 months ago

josephdviviano commented 5 months ago

I likely need to change the default options to enable off policy exploration

josephdviviano commented 5 months ago

Screenshot 2024-03-31 at 12 33 33

Hmm, on my machine, the training isn't perfect (the default options undertrain the policy) but I definitely sample from both modes.

Were your experiments off of master or is this maybe related to changes we made RE: off policy training in the other open pr (https://github.com/GFNOrg/torchgfn/pull/174)?

saleml commented 5 months ago

I'm investigating the issue. Indeed, different behavior on mater and on fix_off_policy.

One thing worth noting is that with the original number of trajectories (1.28e6), sure the samples are from one mode only, but are more accurate.

image

Edit1: And when running the code for slightly longer on the fix_off_policy branch (3e6 trajectories), I obtain a figure that is similar to the one obtained with master.

saleml commented 5 months ago

In fix_off_policy, I had forgotten a keyword. The problem is fixed in https://github.com/GFNOrg/torchgfn/pull/174/commits/89c72b5add431fcd8a787323cdc14aee7ee1ffe8