black-forest-labs / flux

Official inference repo for FLUX.1 models
Apache License 2.0
15.34k stars 1.1k forks source link

Question about the meaning of `guidance` in model forward #159

Closed huayecaibcc closed 1 month ago

huayecaibcc commented 1 month ago

I read the code about flux and found that there are two guidance parameters during model inference, one is guidance_vec and the other is true_gs.

def forward(
        self,
        img: Tensor,
        img_ids: Tensor,
        txt: Tensor,
        txt_ids: Tensor,
        timesteps: Tensor,
        y: Tensor,
        guidance: Tensor | None = None,    ## <---- this one
    ) -> Tensor:
        if img.ndim != 3 or txt.ndim != 3:
            raise ValueError("Input img and txt tensors must have 3 dimensions.")

        # running on sequences img
        img = self.img_in(img)
        vec = self.time_in(timestep_embedding(timesteps, 256))
        if self.params.guidance_embed:
            if guidance is None:
                raise ValueError("Didn't get guidance strength for guidance distilled model.")
            vec = vec + self.guidance_in(timestep_embedding(guidance, 256))
        vec = vec + self.vector_in(y)

guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
for t_curr, t_prev in zip(timesteps[:-1], timesteps[1:]):
    t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
    pred = model(
        img=img,
        img_ids=img_ids,
        txt=txt,
        txt_ids=txt_ids,
        y=vec,
        timesteps=t_vec,
        guidance=guidance_vec,
        image_proj=image_proj,
        ip_scale=ip_scale, 
    )
    if i >= timestep_to_start_cfg:
        neg_pred = model(
            img=img,
            img_ids=img_ids,
            txt=neg_txt,
            txt_ids=neg_txt_ids,
            y=neg_vec,
            timesteps=t_vec,
            guidance=guidance_vec, 
            image_proj=neg_image_proj,
            ip_scale=neg_ip_scale, 
        )     
        pred = neg_pred + true_gs * (pred - neg_pred)

true_gs is used for denoising, which is the famous CFG, which I understand. But guidance_vec, called guidance in the model forward function, seems to control the time step embedding. My question is what is the role of this guidance. I don’t seem to find a clear reference, and it’s hard for me to understand how this parameter works during training. If anyone can answer, I’d be grateful!

huayecaibcc commented 1 month ago

Sorry, I checked the code again and found that the use of the two guidance are in x-flux codes. The true_gs parameter does not exist in the flux code of BFL. I'll leave this question for people who have the same doubts. In the BFL code, this guidance is actually CFG, but when distilling the model, it is turned into an embedding to learn the result of the teacher model adjusting the CFG parameters. Therefore, after the distillation is completed, that is, in the inference of flux-dev, it is not necessary to use CFG inferencing twice (conditional and unconditional) to get the result, thereby speeding up the entire inferencing process. At the same time, the x-flux code changed true_gs from 4 to 1 in one submission, which should be the reason.

huayecaibcc commented 1 month ago

issue closed

maxin-cn commented 4 weeks ago

Hi @huayecaibcc, I am also confused about guidance_vec. Why can it achieve the effect of distillation when it is added? I see it only adds to the timestep.