Acly / comfyui-inpaint-nodes

Nodes for better inpainting with ComfyUI: Fooocus inpaint model for SDXL, LaMa, MAT, and various other tools for pre-filling inpaint & outpaint areas.
GNU General Public License v3.0
490 stars 35 forks source link

I have modified this ApplyFocusInpaint to handle video frames #36

Closed meetedlike closed 2 weeks ago

meetedlike commented 2 months ago

class ApplyFooocusInpaint: @classmethod def INPUT_TYPES(s): return { "required": { "model": ("MODEL",), "patch": ("INPAINT_PATCH",), "latent": ("LATENT",), } }

RETURN_TYPES = ("MODEL",)
CATEGORY = "inpaint"
FUNCTION = "patch"

def patch(self, model: ModelPatcher, patch: tuple[InpaintHead, dict[str, Tensor]], latent: dict[str, Any]):
    base_model: BaseModel = model.model
    latent_pixels = base_model.process_latent_in(latent["samples"])
    noise_mask = latent["noise_mask"].round()

    latent_mask = F.max_pool2d(noise_mask, (8, 8)).round().to(latent_pixels)

    inpaint_head_model, inpaint_lora = patch
    feed = torch.cat([latent_mask, latent_pixels], dim=1)
    inpaint_head_model.to(device=feed.device, dtype=feed.dtype)
    inpaint_head_feature = inpaint_head_model(feed)

    def input_block_patch(h, transformer_options, inpaint_head_feature):
        # 此处保证批次一致性
        scale_factor = h.size(0) // inpaint_head_feature.size(0)
        if scale_factor != 1:
            inpaint_head_feature = inpaint_head_feature.repeat(scale_factor, 1, 1, 1)
        if transformer_options["block"][1] == 0:
            h = h + inpaint_head_feature.to(h)
        return h

    lora_keys = comfy.lora.model_lora_keys_unet(model.model, {})
    lora_keys.update({x: x for x in base_model.state_dict().keys()})
    loaded_lora = load_fooocus_patch(inpaint_lora, lora_keys)

    models = []
    # 假设model.clone()是轻量级操作,可以适应批处理
    for i in range(feed.shape[0]):
        m = model.clone()
        m.set_model_input_block_patch(lambda h, opts: input_block_patch(h, opts, inpaint_head_feature))
        patched = m.add_patches(loaded_lora, 1.0)
        models.append(m)

    not_patched_count = sum(1 for x in loaded_lora if x not in patched)
    if not_patched_count > 0:
        print(f"[ApplyFooocusInpaint] Failed to patch {not_patched_count} keys")

    inject_patched_calculate_weight()
    return models  # 返回处理后的模型列表,适用于批处理

I have modified this ApplyFocusInpaint to handle video frames, but the mask needs to be added to the back of the external drawing board with an extracted mask. This way, the tensor of the mask can be consistent with the tensor of the image, and the code can run normally. Now that it runs, there is no problem processing video frames, but the problem is that consistency cannot be guaranteed, so the effect is not good. The effect of incorporating animatediff is also not very good. I kindly request the author to adjust the mask when he has time to see if it can maintain the consistency of the image when expanding the video frames. Thank you