GFNOrg / diffusion-finetuning

13 stars 0 forks source link

[About inverse_diffusion conditional generation] #1

Open junmokane opened 1 month ago

junmokane commented 1 month ago

Hi. Thanks for the great work.

While reading the code, I couldn't find the function that takes class label and generate corresponding image in GFlowNet finetuned posterior model (p(x|c)). I noticed that in posterior baselines (DPS, LGD-MC), I could see that it takes class label as condition, and generate samples by applying classifier guidance (R(c,x)) to prior model (p(x)) in the code.

I found that there is something in langevin dynamics model that takes finetune_class, but I'm not sure how this is working. Could you elaborate more on this part? Sorry if I understood anything wrong.

Also, could you explain how the sampling works? It seems to be using classifier guidance for conditional sampling, while using this classifier guidance from GFN fine-tuned posterior seems new to me (Or is this a new thing? just wondering). Because in previous works (ADM, DPS, LGD-MC) mentioned in the paper, they approximate the posterior without any additional training (As far as I understand). In RTB, it additionally fine-tunes diffusion prior p(x) with LoRA to approximate p(x|c) (propto p(c|x)p(x)). While the model does not take class label as input when sampling. Could you elaborate on this part? Please correct me if I'm wrong. Thanks.

lucascimeca commented 1 month ago

Hi,

In the published repo (and corresponding paper), the sampling is "conditional" only in the sense that we condition the posterior by a constraint (as given by r(x)) which allows us to finetune a posterior over a (set of) class(es). In that regard, the code in the "Inverse_diffusion" repo won't have any class-dependent conditional logic, and each task requires finetuning the model (i.e. we finetune a different LoRA adapter p(x|c) for each c we consider)).

The "finetune_class" referred to in the code is handled by a list of output logit indices for the classifier of the dataset under scrutiny. The logic in the code will use those indices to extract the classifier output and use it for our reward (r(x)). According to this, the diffusion model can be finetuned with RTB to sample proportionally to our reward, thus prioritizing the classes as specified by the finetune_class.

For the RTB case, once the LoRA weights of a pre-trained model are trained, we can achieve sampling by DDPM or any compatible sampling strategy. As you mentioned, in the code for the DPS-like methods, we perform sampling through classifier guidance directly. For the RTB case, instead, we finetune the model first, then sample from the finetuned model at the end. In the code sampling also happens while training/finetuning, to observe the posterior statistics. A few hundred iterations are typically sufficient for this. We observe the final samples to be unbiased and closer to the true posterior as detailed in the paper.