I have modified this ApplyFocusInpaint to handle video frames

class ApplyFooocusInpaint: @classmethod def INPUT_TYPES(s): return { "required": { "model": ("MODEL",), "patch": ("INPAINT_PATCH",), "latent": ("LATENT",), } }

RETURN_TYPES = ("MODEL",)
CATEGORY = "inpaint"
FUNCTION = "patch"

def patch(self, model: ModelPatcher, patch: tuple[InpaintHead, dict[str, Tensor]], latent: dict[str, Any]):
    base_model: BaseModel = model.model
    latent_pixels = base_model.process_latent_in(latent["samples"])
    noise_mask = latent["noise_mask"].round()

    latent_mask = F.max_pool2d(noise_mask, (8, 8)).round().to(latent_pixels)

    inpaint_head_model, inpaint_lora = patch
    feed = torch.cat([latent_mask, latent_pixels], dim=1)
    inpaint_head_model.to(device=feed.device, dtype=feed.dtype)
    inpaint_head_feature = inpaint_head_model(feed)

    def input_block_patch(h, transformer_options, inpaint_head_feature):
        # 此处保证批次一致性
        scale_factor = h.size(0) // inpaint_head_feature.size(0)
        if scale_factor != 1:
            inpaint_head_feature = inpaint_head_feature.repeat(scale_factor, 1, 1, 1)
        if transformer_options["block"][1] == 0:
            h = h + inpaint_head_feature.to(h)
        return h

    lora_keys = comfy.lora.model_lora_keys_unet(model.model, {})
    lora_keys.update({x: x for x in base_model.state_dict().keys()})
    loaded_lora = load_fooocus_patch(inpaint_lora, lora_keys)

    models = []
    # 假设model.clone()是轻量级操作，可以适应批处理
    for i in range(feed.shape[0]):
        m = model.clone()
        m.set_model_input_block_patch(lambda h, opts: input_block_patch(h, opts, inpaint_head_feature))
        patched = m.add_patches(loaded_lora, 1.0)
        models.append(m)

    not_patched_count = sum(1 for x in loaded_lora if x not in patched)
    if not_patched_count > 0:
        print(f"[ApplyFooocusInpaint] Failed to patch {not_patched_count} keys")

    inject_patched_calculate_weight()
    return models  # 返回处理后的模型列表，适用于批处理

I have modified this ApplyFocusInpaint to handle video frames, but the mask needs to be added to the back of the external drawing board with an extracted mask. This way, the tensor of the mask can be consistent with the tensor of the image, and the code can run normally. Now that it runs, there is no problem processing video frames, but the problem is that consistency cannot be guaranteed, so the effect is not good. The effect of incorporating animatediff is also not very good. I kindly request the author to adjust the mask when he has time to see if it can maintain the consistency of the image when expanding the video frames. Thank you

Acly / comfyui-inpaint-nodes

I have modified this ApplyFocusInpaint to handle video frames #36